Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-14 Thread Samuel Sloniker
+1

On Sun, May 10, 2020 at 10:14 AM Xavi Ivars  wrote:

> First of all, just to mention I don't consider myself a language developer
> (but someone who messes around everything).
>
> -  I think I would leave this for the "secondary tag" developer, similar
> to what we already do to the "primary tags" one. For example, no-one
> forbids currently having a primary tag with any symbol, as long as it's not
> a stream-related one (<,>,^,$,+).
> - Like Jonathan, I think we don't need to have things like
> . It's too long, and would probably clutter the stream
> too much. (Let's remember that, even if the stream is not meant to be
> "human read", it is somewhat "human readable", and it being as concise as
> possible helps.
> - That said, I would *strongly encourage* the secondary tag developer to
> have meaningful secondary tag prefixes, the same way we have meaningful
> primary tags. While we don't have  or , we also don't
> have <€> and <£>, but  and . Having meaningful tags is an awesome
> feature of the stream, that makes it relatively simple to manually create
> input for any part of the pipeline (either to tests a specific command, to
> write tests,...)
>
> So I would *recommend *having short lowercase prefixes, that make it easy
> to understand (or, at least, remember once seen once) what the secondary
> tag is about.
>
>
> Missatge de Francis Tyers  del dia dg., 10 de maig
> 2020 a les 16:07:
>
>> El 2020-05-10 14:51, Samuel Sloniker escribió:
>> > Would it be worth designing a parsing library?
>> >
>> > On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen 
>> > wrote:
>> >
>> >> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
>> >>> For khannatanmai's GSoC project, secondary tags will be
>> >> implemented in a
>> >>> backwards compatible manner. That it in itself indisputable. But,
>> >> there is
>> >>> a question of how the initial batch of secondary tags should look.
>> >>>
>> >>> I feel they should be in the form of , as in a very
>> >> short textual
>> >>> lower-case prefix, followed by :, followed by whatever value there
>> >> is. Or
>> >>> even an upper-case prefix, as in  or .
>> >>>
>> >>> spectie wants symbol prefixes in the form of <%:cdefg>.
>> >>
>> >> I feel like this is just a bikeshed[0] issue, but since I want this
>> >> project to succeed I'll give my 2 cents / rants:
>> >>
>> >> I don't personally find apertium stream format readable, if I need
>> >> to
>> >> make sense of it I will anyways have to preprocess a lot, enough
>> >> that
>> >> I'd say apertium stream format need visualisation scripts to be
>> >> readable. It's not very hard to have dev scripts for this. That
>> >> being
>> >> said, I don't find apertium stream format very machine readable
>> >> either;
>> >> with regexes you need tons of ëscapes and double escapes, with
>> >> programming languages... well, you have to use regexes because it's
>> >> not
>> >> a standard format with readily available parsing library or a format
>> >> neatly designed for python split() or c strtoks, or so... I'm fine
>> >> with
>> >> either special symbols or strings for whatever, as a purely personal
>> >> preference I've been pro feature=value even before ud times but
>> >> that's
>> >> not important, as long as stuff is handlable with grep and sed
>> >> without
>> >> convoluted expressions it's all good, no? To that ggoal on the
>> >> question
>> >> of having known set of prefixes, I have always been of the opinion
>> >> that
>> >> any mature release-quality apertium stuff would follow the tags docu
>> >> on
>> >> the wiki[1], I would expect similar to be true for prefixes as well.
>> >>
>> >> One side note: I think there is a level of abstraction we often
>> >> overlook
>> >> in these developments; a part of language data developer base will
>> >> probably interact with these secondary things through the XML
>> >> formats if
>> >> I understand correctly? Surely one of the things that can be done
>> >> regardless of what kind of stream format representation the seodnary
>> >> stuff has, is to have the xml format part more self-documenting and
>> >> stream format more readale? And like eventually one could think
>> >> there
>> >> were tooling and visualisations or whatnot to support whatever
>> >> readable
>> >> and parsable formats if enough stuff is in the xml sources.
>> >>
>> >> so tldr; just pick whatever greppable stuff for apertium strem
>> >> format.
>> >>
>> >> [0] 
>> >> [1] 
>> >>
>> >> --
>> >> Regards, Flammie 
>> >> (Please note, that I will often include my replies inline instead of
>> >> top or bottom of the mail)
>> >> ___
>> >> Apertium-stuff mailing list
>> >> Apertium-stuff@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> > ___
>> > Apertium-stuff mailing list
>> > 

Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-10 Thread Daniel Swanson
On Sun, May 10, 2020 at 6:15 AM Flammie A Pirinen  wrote:
>
> I don't personally find apertium stream format readable, if I need to
> make sense of it I will anyways have to preprocess a lot, enough that
> I'd say apertium stream format need visualisation scripts to be
> readable. It's not very hard to have dev scripts for this. That being
> said, I don't find apertium stream format very machine readable either;
> with regexes you need tons of ëscapes and double escapes, with
> programming languages... well, you have to use regexes because it's not
> a standard format with readily available parsing library or a format
> neatly designed for python split() or c strtoks, or so... I'm fine with
> either special symbols or strings for whatever, as a purely personal
> preference I've been pro feature=value even before ud times but that's
> not important, as long as stuff is handlable with grep and sed without
> convoluted expressions it's all good, no? To that ggoal on the question
> of having known set of prefixes, I have always been of the opinion that
> any mature release-quality apertium stuff would follow the tags docu on
> the wiki[1], I would expect similar to be true for prefixes as well.
>

Regarding visualization, I've made a stream colorizer if anyone would
find such a thing useful: https://github.com/mr-martian/stream-color


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-10 Thread Xavi Ivars
First of all, just to mention I don't consider myself a language developer
(but someone who messes around everything).

-  I think I would leave this for the "secondary tag" developer, similar to
what we already do to the "primary tags" one. For example, no-one forbids
currently having a primary tag with any symbol, as long as it's not a
stream-related one (<,>,^,$,+).
- Like Jonathan, I think we don't need to have things like
. It's too long, and would probably clutter the stream
too much. (Let's remember that, even if the stream is not meant to be
"human read", it is somewhat "human readable", and it being as concise as
possible helps.
- That said, I would *strongly encourage* the secondary tag developer to
have meaningful secondary tag prefixes, the same way we have meaningful
primary tags. While we don't have  or , we also don't
have <€> and <£>, but  and . Having meaningful tags is an awesome
feature of the stream, that makes it relatively simple to manually create
input for any part of the pipeline (either to tests a specific command, to
write tests,...)

So I would *recommend *having short lowercase prefixes, that make it easy
to understand (or, at least, remember once seen once) what the secondary
tag is about.


Missatge de Francis Tyers  del dia dg., 10 de maig
2020 a les 16:07:

> El 2020-05-10 14:51, Samuel Sloniker escribió:
> > Would it be worth designing a parsing library?
> >
> > On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen 
> > wrote:
> >
> >> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
> >>> For khannatanmai's GSoC project, secondary tags will be
> >> implemented in a
> >>> backwards compatible manner. That it in itself indisputable. But,
> >> there is
> >>> a question of how the initial batch of secondary tags should look.
> >>>
> >>> I feel they should be in the form of , as in a very
> >> short textual
> >>> lower-case prefix, followed by :, followed by whatever value there
> >> is. Or
> >>> even an upper-case prefix, as in  or .
> >>>
> >>> spectie wants symbol prefixes in the form of <%:cdefg>.
> >>
> >> I feel like this is just a bikeshed[0] issue, but since I want this
> >> project to succeed I'll give my 2 cents / rants:
> >>
> >> I don't personally find apertium stream format readable, if I need
> >> to
> >> make sense of it I will anyways have to preprocess a lot, enough
> >> that
> >> I'd say apertium stream format need visualisation scripts to be
> >> readable. It's not very hard to have dev scripts for this. That
> >> being
> >> said, I don't find apertium stream format very machine readable
> >> either;
> >> with regexes you need tons of ëscapes and double escapes, with
> >> programming languages... well, you have to use regexes because it's
> >> not
> >> a standard format with readily available parsing library or a format
> >> neatly designed for python split() or c strtoks, or so... I'm fine
> >> with
> >> either special symbols or strings for whatever, as a purely personal
> >> preference I've been pro feature=value even before ud times but
> >> that's
> >> not important, as long as stuff is handlable with grep and sed
> >> without
> >> convoluted expressions it's all good, no? To that ggoal on the
> >> question
> >> of having known set of prefixes, I have always been of the opinion
> >> that
> >> any mature release-quality apertium stuff would follow the tags docu
> >> on
> >> the wiki[1], I would expect similar to be true for prefixes as well.
> >>
> >> One side note: I think there is a level of abstraction we often
> >> overlook
> >> in these developments; a part of language data developer base will
> >> probably interact with these secondary things through the XML
> >> formats if
> >> I understand correctly? Surely one of the things that can be done
> >> regardless of what kind of stream format representation the seodnary
> >> stuff has, is to have the xml format part more self-documenting and
> >> stream format more readale? And like eventually one could think
> >> there
> >> were tooling and visualisations or whatnot to support whatever
> >> readable
> >> and parsable formats if enough stuff is in the xml sources.
> >>
> >> so tldr; just pick whatever greppable stuff for apertium strem
> >> format.
> >>
> >> [0] 
> >> [1] 
> >>
> >> --
> >> Regards, Flammie 
> >> (Please note, that I will often include my replies inline instead of
> >> top or bottom of the mail)
> >> ___
> >> Apertium-stuff mailing list
> >> Apertium-stuff@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> There is already
> https://github.com/apertium/streamparser
>
> for Python...
>
> Fran
>
>
> 

Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-10 Thread Francis Tyers

El 2020-05-10 14:51, Samuel Sloniker escribió:

Would it be worth designing a parsing library?

On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen 
wrote:


On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:

For khannatanmai's GSoC project, secondary tags will be

implemented in a

backwards compatible manner. That it in itself indisputable. But,

there is

a question of how the initial batch of secondary tags should look.

I feel they should be in the form of , as in a very

short textual

lower-case prefix, followed by :, followed by whatever value there

is. Or

even an upper-case prefix, as in  or .

spectie wants symbol prefixes in the form of <%:cdefg>.


I feel like this is just a bikeshed[0] issue, but since I want this
project to succeed I'll give my 2 cents / rants:

I don't personally find apertium stream format readable, if I need
to
make sense of it I will anyways have to preprocess a lot, enough
that
I'd say apertium stream format need visualisation scripts to be
readable. It's not very hard to have dev scripts for this. That
being
said, I don't find apertium stream format very machine readable
either;
with regexes you need tons of ëscapes and double escapes, with
programming languages... well, you have to use regexes because it's
not
a standard format with readily available parsing library or a format
neatly designed for python split() or c strtoks, or so... I'm fine
with
either special symbols or strings for whatever, as a purely personal
preference I've been pro feature=value even before ud times but
that's
not important, as long as stuff is handlable with grep and sed
without
convoluted expressions it's all good, no? To that ggoal on the
question
of having known set of prefixes, I have always been of the opinion
that
any mature release-quality apertium stuff would follow the tags docu
on
the wiki[1], I would expect similar to be true for prefixes as well.

One side note: I think there is a level of abstraction we often
overlook
in these developments; a part of language data developer base will
probably interact with these secondary things through the XML
formats if
I understand correctly? Surely one of the things that can be done
regardless of what kind of stream format representation the seodnary
stuff has, is to have the xml format part more self-documenting and
stream format more readale? And like eventually one could think
there
were tooling and visualisations or whatnot to support whatever
readable
and parsable formats if enough stuff is in the xml sources.

so tldr; just pick whatever greppable stuff for apertium strem
format.

[0] 
[1] 

--
Regards, Flammie 
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


There is already
https://github.com/apertium/streamparser

for Python...

Fran


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-10 Thread Samuel Sloniker
Would it be worth designing a parsing library?

On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen  wrote:

> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
> > For khannatanmai's GSoC project, secondary tags will be implemented in a
> > backwards compatible manner. That it in itself indisputable. But, there
> is
> > a question of how the initial batch of secondary tags should look.
> >
> > I feel they should be in the form of , as in a very short
> textual
> > lower-case prefix, followed by :, followed by whatever value there is. Or
> > even an upper-case prefix, as in  or .
> >
> > spectie wants symbol prefixes in the form of <%:cdefg>.
>
> I feel like this is just a bikeshed[0] issue, but since I want this
> project to succeed I'll give my 2 cents / rants:
>
> I don't personally find apertium stream format readable, if I need to
> make sense of it I will anyways have to preprocess a lot, enough that
> I'd say apertium stream format need visualisation scripts to be
> readable. It's not very hard to have dev scripts for this. That being
> said, I don't find apertium stream format very machine readable either;
> with regexes you need tons of ëscapes and double escapes, with
> programming languages... well, you have to use regexes because it's not
> a standard format with readily available parsing library or a format
> neatly designed for python split() or c strtoks, or so... I'm fine with
> either special symbols or strings for whatever, as a purely personal
> preference I've been pro feature=value even before ud times but that's
> not important, as long as stuff is handlable with grep and sed without
> convoluted expressions it's all good, no? To that ggoal on the question
> of having known set of prefixes, I have always been of the opinion that
> any mature release-quality apertium stuff would follow the tags docu on
> the wiki[1], I would expect similar to be true for prefixes as well.
>
> One side note: I think there is a level of abstraction we often overlook
> in these developments; a part of language data developer base will
> probably interact with these secondary things through the XML formats if
> I understand correctly? Surely one of the things that can be done
> regardless of what kind of stream format representation the seodnary
> stuff has, is to have the xml format part more self-documenting and
> stream format more readale? And like eventually one could think there
> were tooling and visualisations or whatnot to support whatever readable
> and parsable formats if enough stuff is in the xml sources.
>
> so tldr; just pick whatever greppable stuff for apertium strem format.
>
> [0] 
> [1] 
>
> --
> Regards, Flammie 
> (Please note, that I will often include my replies inline instead of
> top or bottom of the mail)
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-10 Thread Flammie A Pirinen
On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
> For khannatanmai's GSoC project, secondary tags will be implemented in a
> backwards compatible manner. That it in itself indisputable. But, there is
> a question of how the initial batch of secondary tags should look.
> 
> I feel they should be in the form of , as in a very short textual
> lower-case prefix, followed by :, followed by whatever value there is. Or
> even an upper-case prefix, as in  or .
> 
> spectie wants symbol prefixes in the form of <%:cdefg>.

I feel like this is just a bikeshed[0] issue, but since I want this
project to succeed I'll give my 2 cents / rants:

I don't personally find apertium stream format readable, if I need to
make sense of it I will anyways have to preprocess a lot, enough that
I'd say apertium stream format need visualisation scripts to be
readable. It's not very hard to have dev scripts for this. That being
said, I don't find apertium stream format very machine readable either;
with regexes you need tons of ëscapes and double escapes, with
programming languages... well, you have to use regexes because it's not
a standard format with readily available parsing library or a format
neatly designed for python split() or c strtoks, or so... I'm fine with
either special symbols or strings for whatever, as a purely personal
preference I've been pro feature=value even before ud times but that's
not important, as long as stuff is handlable with grep and sed without
convoluted expressions it's all good, no? To that ggoal on the question
of having known set of prefixes, I have always been of the opinion that
any mature release-quality apertium stuff would follow the tags docu on
the wiki[1], I would expect similar to be true for prefixes as well.

One side note: I think there is a level of abstraction we often overlook
in these developments; a part of language data developer base will
probably interact with these secondary things through the XML formats if
I understand correctly? Surely one of the things that can be done
regardless of what kind of stream format representation the seodnary
stuff has, is to have the xml format part more self-documenting and
stream format more readale? And like eventually one could think there
were tooling and visualisations or whatnot to support whatever readable
and parsable formats if enough stuff is in the xml sources.

so tldr; just pick whatever greppable stuff for apertium strem format.

[0] 
[1] 

-- 
Regards, Flammie 
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-09 Thread Jonathan Washington
Speaking as a language developer,¹ I prefer concise, textual tags.

E.g., I don't think  is good—it clogs the stream
with verbosity, as Fran points out.

On the other hand, I don't mind symbols here and there, like <§agent>.
But I don't think this is a good secondary tag, unless we make it very
explicit which unicode spans/classes can be used to define secondary
tags.

I don't like tags like <:human>, unless we are explicit about
considering this a special way of encoding *semantic category*
information in secondary tags.

Fran, regarding these last two points, could you define what each
symbol in your example tags stands for, and what range/class of
symbols can continue to be used for secondary tags?

¹ Qualification: not one that's developed a released pair from start
to finish, but that's mostly because my attention is too divided...  I
do have decent amounts of experience with the entire pipeline, though,
and experience working on all pipeline stages in several individual
pairs (including one approaching release / "in nursery").

--
Jonathan

8 may 2020, C. tarixində 14:02 tarixində Hèctor Alòs i Font
 yazdı:
>
> Missatge de Francis Tyers  del dia dv., 8 de maig 2020 a 
> les 18:05:
>>
>> El 2020-05-08 15:50, Tino Didriksen escribió:
>> > For khannatanmai's GSoC project, secondary tags will be implemented in
>> > a backwards compatible manner. That it in itself indisputable. But,
>> > there is a question of how the initial batch of secondary tags should
>> > look.
>> >
>> > I feel they should be in the form of , as in a very short
>> > textual lower-case prefix, followed by :, followed by whatever value
>> > there is. Or even an upper-case prefix, as in  or .
>> >
>> > spectie wants symbol prefixes in the form of <%:cdefg>.
>> >
>>
>> [snip]
>>
>> > From a technical and scientific basis, textual prefixes are just
>> > better. And yet, spectie wants symbol prefixes because he likes them.
>> > I disagree. Hence, this mail asking for opinions.
>> >
>> > Do you language developers actually prefer symbol prefixes?
>> >
>>
>> Tino misrepresented me slightly. I never proposed using the pound sign.
>>
>> My proposal was for:
>>
>>
>> отец<@subj><§agent><%:отца><:human><:kin>
>>
>> If we have to have these "secondary tags"... which I have yet to be
>> completely convinced of,
>> I would like to have them be readable and not clutter the stream with
>> unnecessary
>> verbosity. There are a lot of rule-based formalisms out there that are
>> impossible to read,
>> having been dreamt up by people who don't actually spend a lot of time
>> writing language
>> data, and I would like to avoid that happening with Apertium.
>
>
> Well, from a developer's point of view, I'd like very much if I could get 
> information like "human", "construction", "denonym", "material", "musical 
> instrument", etc. which I have to use for lexical selection and also 
> sometimes for transfer. It seems logical to me that this data would be some 
> day placed in the dictionary or in a kind of secondary dictionary. In fact 
> the trend is already to add more semantic information to words: for example 
> in proper names we now often distinguish between first names, surnames, place 
> names, hidronyms, etc.
>
> Personally, I don't have any preference in the syntax. I'm fine with any 
> method that is short, easy to type on any keyboard and that identifies a tag 
> as secondary.
>
> Hèctor
>
>
>>
>> Again, and again I want to see a translation and a linguistic
>> motivation. In an _actual_
>> language pair, not in someone's imagination.
>>
>> We have a lot of modules that have been made but not reached use in a
>> released pair,
>> so I don't see how this should be different.
>>
>> Fran
>>
>>
>>
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-08 Thread Hèctor Alòs i Font
Missatge de Francis Tyers  del dia dv., 8 de maig 2020
a les 18:05:

> El 2020-05-08 15:50, Tino Didriksen escribió:
> > For khannatanmai's GSoC project, secondary tags will be implemented in
> > a backwards compatible manner. That it in itself indisputable. But,
> > there is a question of how the initial batch of secondary tags should
> > look.
> >
> > I feel they should be in the form of , as in a very short
> > textual lower-case prefix, followed by :, followed by whatever value
> > there is. Or even an upper-case prefix, as in  or .
> >
> > spectie wants symbol prefixes in the form of <%:cdefg>.
> >
>
> [snip]
>
> > From a technical and scientific basis, textual prefixes are just
> > better. And yet, spectie wants symbol prefixes because he likes them.
> > I disagree. Hence, this mail asking for opinions.
> >
> > Do you language developers actually prefer symbol prefixes?
> >
>
> Tino misrepresented me slightly. I never proposed using the pound sign.
>
> My proposal was for:
>
>
> отец<@subj><§agent><%:отца><:human><:kin>
>
> If we have to have these "secondary tags"... which I have yet to be
> completely convinced of,
> I would like to have them be readable and not clutter the stream with
> unnecessary
> verbosity. There are a lot of rule-based formalisms out there that are
> impossible to read,
> having been dreamt up by people who don't actually spend a lot of time
> writing language
> data, and I would like to avoid that happening with Apertium.
>

Well, from a developer's point of view, I'd like very much if I could get
information like "human", "construction", "denonym", "material", "musical
instrument", etc. which I have to use for lexical selection and also
sometimes for transfer. It seems logical to me that this data would be some
day placed in the dictionary or in a kind of secondary dictionary. In fact
the trend is already to add more semantic information to words: for example
in proper names we now often distinguish between first names, surnames,
place names, hidronyms, etc.

Personally, I don't have any preference in the syntax. I'm fine with any
method that is short, easy to type on any keyboard and that identifies a
tag as secondary.

Hèctor



> Again, and again I want to see a translation and a linguistic
> motivation. In an _actual_
> language pair, not in someone's imagination.
>
> We have a lot of modules that have been made but not reached use in a
> released pair,
> so I don't see how this should be different.
>
> Fran
>
>
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-08 Thread Tanmai Khanna
>
> My proposal was for:
>
>
> отец<@subj><§agent><%:отца><:human><:kin>
>
> If we have to have these "secondary tags"... which I have yet to be
> completely convinced of,
>

What exactly is your hesitation here? I want to make sure you guys are
happy with the proposal before going ahead with it, but I'm not able to get
through to you with arguments about eliminating trimming and markup
handling. Given that there's no regression, and there's clear benefits,
which were pointed out in the IRC, with regards to large monodixes and
small bidixes creating a bottleneck which makes disambiguation and transfer
demonstrably worse. Also, we discussed that after eliminating trimming we
can weigh the analyses based on the bidix so as to keep the benefits of
trimming with rare words and compounds. What is cause of hesitation to
include secondary tags?


> I would like to have them be readable and not clutter the stream with
> unnecessary
> verbosity. There are a lot of rule-based formalisms out there that are
> impossible to read,
> having been dreamt up by people who don't actually spend a lot of time
> writing language
> data, and I would like to avoid that happening with Apertium.
>
> Again, and again I want to see a translation and a linguistic
> motivation. In an _actual_
> language pair, not in someone's imagination.


I agree, and the contention here, more than just objective metrics, is also
which will be more readable. Some might feel the prefixes interfere with
the data, and some might feel they're self-documenting and clearer. Which
is why this mail was sent, to find out the views of the people who work
with actual language pairs and find out which is better for them.


> We have a lot of modules that have been made but not reached use in a
> released pair,
> so I don't see how this should be different.
>

Forgive me if I misunderstood you but this is a little disheartening for
me. I want to give this project my best, and I'm taking immense care to
convince everyone of the benefits, of how there will be no regression,
because I want to create a benefit for the released pairs. If this project
is going to be shelved then there really isn't much use of me worrying so
much about whether people see the linguistic motivation and benefits that
come out of this. I've always seen myself as a linguistics student first
and computer scientist second, and several times I have claimed that to
eliminate trimming, which is the primary linguistic motivation, propagating
surface form in the pipe is essential, for which secondary tags are
necessary.

I request the community to bombard this project with their skepticism,
their doubts, their suggestions, their criticism, and I promise there will
be thorough discussion and we will achieve an acceptable compromise on all
fronts. What I would hate for this project is to be finished and then never
reaching released pairs because some fundamental flaw was never discussed.

Thanks and Regards,
Tanmai Khanna

-- 
*Khanna, Tanmai*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-08 Thread Francis Tyers

El 2020-05-08 15:50, Tino Didriksen escribió:

For khannatanmai's GSoC project, secondary tags will be implemented in
a backwards compatible manner. That it in itself indisputable. But,
there is a question of how the initial batch of secondary tags should
look.

I feel they should be in the form of , as in a very short
textual lower-case prefix, followed by :, followed by whatever value
there is. Or even an upper-case prefix, as in  or .

spectie wants symbol prefixes in the form of <%:cdefg>.



[snip]


From a technical and scientific basis, textual prefixes are just
better. And yet, spectie wants symbol prefixes because he likes them.
I disagree. Hence, this mail asking for opinions.

Do you language developers actually prefer symbol prefixes?



Tino misrepresented me slightly. I never proposed using the pound sign.

My proposal was for:

  
отец<@subj><§agent><%:отца><:human><:kin>


If we have to have these "secondary tags"... which I have yet to be 
completely convinced of,
I would like to have them be readable and not clutter the stream with 
unnecessary
verbosity. There are a lot of rule-based formalisms out there that are 
impossible to read,
having been dreamt up by people who don't actually spend a lot of time 
writing language

data, and I would like to avoid that happening with Apertium.

Again, and again I want to see a translation and a linguistic 
motivation. In an _actual_

language pair, not in someone's imagination.

We have a lot of modules that have been made but not reached use in a 
released pair,

so I don't see how this should be different.

Fran





___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff