Hello Olya and everyone,

Very interesting project!
I am working on underserved languages in Wikipedia as well, mainly as part
of my research. In our most recent work we experimented with generating
Wikipedia summaries from Wikidata facts in underserved languages, which
worked quite well [1][2]. The idea was based on the ArticlePlaceholder [3]
that display Wikidata triples on Wikipedia dynamically.
Learning from existing Wikipedia articles in the language has the advantage
that we can keep cultural and linguistic  attributes as they are. Which is
similar to a human-in-the-loop approach as you suggest.
While no humans are needed for our summary generation, one of the main
drawbacks is that currently it is just one single introductory sentence. We
are planning on experimenting extending this with a project at the end of
the year.
I am always happy to discuss these topics more and would be interested if
there is something in our approach that is helpful for you and vice versa!

All the best,
Lucie


[1] preprint:
https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_131.pdf
[2] https://arxiv.org/pdf/1803.07116.pdf
[3] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder

On 18 June 2018 at 10:43, Gerard Meijssen <[email protected]> wrote:

> Hoi,
> There is no explicit link between the data and the lexicographic data. As
> a consequence it will not be easy to make use of the existing labels for
> automated translation servies. This has been an explicit architectural
> decision..
>
> For me it will be interesting to learn how these links will be realised
> and how existing differences will be reconciled and how this will impact
> services like translation services.
> Thanks,
>      GerardM
>
> On 18 June 2018 at 08:52, Info WorldUniversity <info@
> worlduniversityandschool.org> wrote:
>
>> HI Irene, Wikibabel, Gerd, and Wikidatans,
>>
>> How does Wikidata's new lexicographical project work with regard to
>> Swahili (since it is a Wikipedia / Wikidata language) and Google Translate
>> / GNMT re your "Our approach leverages Google Translate
>> <https://cloud.google.com/translate/docs/> to make English Wikipedia
>> articles accessible to underserved communities" (re:
>> https://medium.com/@oirzak/wikibabel-equalizing-information-
>> access-on-a-budget-4038f750e90e)?
>>
>> Will a Wikibabel team you help create add Swahili lexemes to the
>> lexicographical project - https://www.wikidata.org/wiki/
>> Wikidata:Lexicographical_data - and then Google GNMT - which is
>> end-to-end translation software ... https://1.bp.blogspot.com/
>> -jwgtcgkgG2o/WDSBrwu9jeI/AAAAAAAABbM/2Eobq-N9_nYeAdeH-sB_
>> NZGbhyoSWgReACLcB/s1600/image01.gif (https://ai.googleblog.com/201
>> 6/11/zero-shot-translation-with-googles.html) - use this new Swahili
>> lexicographical data by processing this through its algorithms?
>>
>> (WUaS seeks to facilitate machine translation in all 7097 living
>> languages, and by growing out of Google GNMT; WUaS donated itself for
>> co-development to Wikidata in 2015).
>>
>> Cheers,
>> Scott
>>
>>
>>
>> On Sun, Jun 17, 2018 at 10:28 PM, Gerard Meijssen <
>> [email protected]> wrote:
>>
>>> Hoi,
>>> I am giving a lot of attention to content that deals with Africa. At
>>> that I also target the Swahili wikipedia [1] (I have not filled in all the
>>> red links yet). At this moment I am adding information in Wikidata about
>>> Tanzanian wards based on sw.wikipedia categories and templates.
>>>
>>> Many of the African language Wikipedias are struggling. By making the
>>> lists as complete as possible based on categories and lists, the
>>> information becomes more useful and better, it can be and is used in the
>>> same manner on multiple Wikipedias. At this moment zu yo en sw Wikipedia.
>>> As the information is made available using Listeria lists, the information
>>> gets updated as and when new information becomes available.
>>>
>>> Another notion of mine is that it will help with individual info boxes
>>> eg for politicians, or indeed Tanzanian wards .. :)
>>>
>>> NB I am a big fan of providing information using machine translation.
>>> However, PLEASE consider the lessons learned from the Cebuano Wikipedia and
>>> make the texts available in a cached way; not in the final form as saved
>>> text.
>>> Thanks,
>>>       GerardM
>>>
>>> PS when there is something where we can collaborate, please let me know.
>>>
>>>
>>> [1] https://sw.wikipedia.org/wiki/Mtumiaji:GerardM
>>>
>>> On 18 June 2018 at 01:12, Olya Irzak <[email protected]> wrote:
>>>
>>>> Dear Wikidata community,
>>>>
>>>> We're working on a project called Wikibabel to machine-translate parts
>>>> of Wikipedia into underserved languages, starting with Swahili.
>>>>
>>>> In hopes that some of our ideas can be helpful to machine translation
>>>> projects, we wrote a blogpost about how we prioritized which pages to
>>>> translate, and what categories need a human in the loop:
>>>> https://medium.com/@oirzak/wikibabel-equalizing-information-
>>>> access-on-a-budget-4038f750e90e
>>>>
>>>> Rumor has it that the Wikidata community has thought deeply about
>>>> information access. We'd love your feedback on our work. Please let us know
>>>> about past / ongoing machine translation related projects so we can learn
>>>> from & collaborate with them.
>>>>
>>>> Best regards,
>>>> Olya & the Wikibabel crew
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>>
>> --
>>
>> --
>> - Scott MacLeod - Founder & President
>> - https://twitter.com/WorldUnivAndSch
>> - World University and School
>> - http://worlduniversityandschool.org
>> - http://scottmacleod.com
>>
>> - CC World University and School - like CC Wikipedia with best
>> STEM-centric CC OpenCourseWare - incorporated as a nonprofit university and
>> school in California, and is a U.S. 501 (c) (3) tax-exempt educational
>> organization.
>>
>>
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 
Lucie-Aimée Kaffee
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to