[Wikimedia-l] Dictator

Milos Rancic Tue, 27 Oct 2015 07:56:20 -0700

No, it's not about Jimmy :P It's about the software for parsing
dictionaries. And we are presently inside of not so stable phase of
switching from the name "dictator" to "dicteator" (etymology is
"dictionary creator").


One of my strategic goals in relation to the movement itself is to
create methodology for parsing dictionaries and adding them into the
Wikimedia projects (first Wiktionary, but Wikidata and likely some
other projects in the future) and a pool of programmers to keep that
knowledge inside of the movement.

So, besides the software itself [1], one of the tasks of Milos
Trifunovic, programmer who is inserting data for the project
Wiktionary Meets Matica Srpska [2] was to create a white paper about
the process itself, which he did [3].

I know there were numerous previous additions of the dictionaries into
the Wiktionaries, but, as far as I know, no systematic effort was put
into dissemination of that knowledge.

We are at the beginning of the process. Up to the present, there are
~26k new entries on Serbian Wiktionary. The code is in the initial
useful phase. Up to the end of this part of the process I expect from
a few hundred thousand to a few million of new entries all over
Wiktionary editions (the most optimistic estimation is a few dozen
million entries; estimation varies that much because of many factors,
including future community involvement; one million is a reachable
target).

Keep in mind that there are three different stages of the software,
with various levels of usefulness and complexity:

1) Adding the content into Wiktionary. This depends on particular
Wiktionary customs, could be easily changed and didn't require too
much of sophistication, as it's dominantly about Pywikipediabot and
wiki syntax.

2) JSON intermediate storage. That's important as it's the most formal
way for representing dictionary data for future use, not depending of
destination platform. There is a space for further development of the
particular format and your participation will be appreciated.

3) Parsing particular dictionary. That's dictionary-specific, but a
number of methods could be shared for parsing other dictionaries. As
many dictionaries we have, as much we will have developed common
methods.

So, if you are interested into the matter, please go to the talk page
[4] and give your suggestions. Also, don't hesitate to reach me
directly.

[1] https://github.com/Interglider/dictator
[2] 
https://meta.wikimedia.org/wiki/Grants:PEG/Interglider.ORG/Wiktionary_Meets_Matica_Srpska
[3] https://meta.wikimedia.org/wiki/Dicteator
[4] https://meta.wikimedia.org/wiki/Talk:Dicteator

_______________________________________________
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Dictator

Reply via email to