Re: [Apertium-stuff] Thoughts on UDPipe, Apertium modules and translation system for Interslavic

2021-12-16 Thread Daniel Swanson
... and right after sending my previous email I decided that creating
such a prototype was a more fun use of my flight home than
treebanking, so here's what I came up with:
https://github.com/mr-martian/UD-transfer

It probably needs another couple hours of work before it'll actually
do anything, but I should be able to manage that pretty soon (the
hardest part of any project is starting).

Daniel

On Thu, Dec 16, 2021 at 7:17 PM Daniel Swanson
 wrote:
>
> Greetings Apertiumers!
>
> Figuring out how to incorporate UD parsers into Apertium pipelines is
> something that's been on my todo list for a while, but with the
> unfortunate property that it keeps getting sidelined by projects that
> have deadlines.
>
> With regards to your specific issue, here are the options I can think of:
>
> 1. apertium-transfer / chunking
> The chunker can pretty much only process adjacent words. You can
> encode dependency labels to some extent (e.g.
> ^green/green<@amod>$), and the rules can refer to those
> tags, but I don't think there's any way to access the actual relations
> that isn't incredibly hacky and fragile.
>
> 2. apertium-recursive
> This was created precisely because chunking can't handle long distance
> relationships, but to actually use it, you'd end up somehow encoding
> and then re-parsing the tree structure which is still fairly fragile
> while also probably being an enormous waste of energy.
>
> 3. Constraint Grammar
> VISL CG-3 can manipulate dependency trees and writing agreement rules
> would be fairly straightforward, though you'd have to write them from
> scratch rather than copying from existing sources.
>
> 4. Bug me to make a real solution
> Prototyping a pipeline module to do pretty much exactly what you're
> talking about is nominally fairly high on my todo list, and if someone
> is actually waiting for it there's a decent amount of hope that I'll
> actually start it rather than some other project.
>
> If your main concern is agreement, 3 strikes me as a pretty good
> option. On the other hand, if you actually need to modify the tree
> structure, 3 might get complicated in which case I'd recommend 4.
>
> Daniel
>
> On Thu, Dec 16, 2021 at 5:20 PM Виктор Булатов  wrote:
> >
> > Hi everyone. The Interslavic language is a constructed language that is 
> > created in such a way that people from Slavic countries are able to 
> > understand most of it without any prior education. It has a Wikipedia page 
> > and everything (maybe we even will have an ISO-639-3 code "ISV" in the 
> > future, fingers crossed!).
> >
> > I'm looking into developing some sort of MT system for Interslavic (mainly 
> > the "Some Natural Slavic Language -> Interslavic" direction). I've managed 
> > to cobble a prototype with Russian UDPipe and ISV morphological data/rules 
> > before finding out about Apertium (and you guys seem interesting).
> >
> > The thing is, Russian and Czech are probably the richest Slavic languages 
> > in terms of NLP resources. Apertium obviously isn't going to beat a 
> > dependency parser that was trained on >1M of labeled sentences. So, I don't 
> > really need any of the earlier stages of the Apertium pipeline. However, 
> > the chunking and multi-word-expression modules seem promising, especially 
> > given that I probably could re-use already existing rules (that are written 
> > for different Slavic languages, but it doesn't matter).
> >
> > So, my question is: is it possible to use the chunking module in isolation? 
> > Preferably in a way that allows manipulation of UDPipe's dependency trees? 
> > For example, to ensure gender agreement between a noun and attached 
> > adjectives.
> >
> > I would be happy to hear any other advice!
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Thoughts on UDPipe, Apertium modules and translation system for Interslavic

2021-12-16 Thread Daniel Swanson
Greetings Apertiumers!

Figuring out how to incorporate UD parsers into Apertium pipelines is
something that's been on my todo list for a while, but with the
unfortunate property that it keeps getting sidelined by projects that
have deadlines.

With regards to your specific issue, here are the options I can think of:

1. apertium-transfer / chunking
The chunker can pretty much only process adjacent words. You can
encode dependency labels to some extent (e.g.
^green/green<@amod>$), and the rules can refer to those
tags, but I don't think there's any way to access the actual relations
that isn't incredibly hacky and fragile.

2. apertium-recursive
This was created precisely because chunking can't handle long distance
relationships, but to actually use it, you'd end up somehow encoding
and then re-parsing the tree structure which is still fairly fragile
while also probably being an enormous waste of energy.

3. Constraint Grammar
VISL CG-3 can manipulate dependency trees and writing agreement rules
would be fairly straightforward, though you'd have to write them from
scratch rather than copying from existing sources.

4. Bug me to make a real solution
Prototyping a pipeline module to do pretty much exactly what you're
talking about is nominally fairly high on my todo list, and if someone
is actually waiting for it there's a decent amount of hope that I'll
actually start it rather than some other project.

If your main concern is agreement, 3 strikes me as a pretty good
option. On the other hand, if you actually need to modify the tree
structure, 3 might get complicated in which case I'd recommend 4.

Daniel

On Thu, Dec 16, 2021 at 5:20 PM Виктор Булатов  wrote:
>
> Hi everyone. The Interslavic language is a constructed language that is 
> created in such a way that people from Slavic countries are able to 
> understand most of it without any prior education. It has a Wikipedia page 
> and everything (maybe we even will have an ISO-639-3 code "ISV" in the 
> future, fingers crossed!).
>
> I'm looking into developing some sort of MT system for Interslavic (mainly 
> the "Some Natural Slavic Language -> Interslavic" direction). I've managed to 
> cobble a prototype with Russian UDPipe and ISV morphological data/rules 
> before finding out about Apertium (and you guys seem interesting).
>
> The thing is, Russian and Czech are probably the richest Slavic languages in 
> terms of NLP resources. Apertium obviously isn't going to beat a dependency 
> parser that was trained on >1M of labeled sentences. So, I don't really need 
> any of the earlier stages of the Apertium pipeline. However, the chunking and 
> multi-word-expression modules seem promising, especially given that I 
> probably could re-use already existing rules (that are written for different 
> Slavic languages, but it doesn't matter).
>
> So, my question is: is it possible to use the chunking module in isolation? 
> Preferably in a way that allows manipulation of UDPipe's dependency trees? 
> For example, to ensure gender agreement between a noun and attached 
> adjectives.
>
> I would be happy to hear any other advice!
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff