Dear all,
I thank you for your efforts. To know more about word embedding and semantic 
similarity, please refer to the survey of our research group about the issue 
available at 
https://www.sciencedirect.com/science/article/pii/S0952197619301745. If you 
would like that we work on using these techniques to enrich Lexicographical 
Data on Wikidata, we will be honoured to do this. However, we will face two 
main problems. The first one is absolutely funding and the second one is that 
we need people to validate the information returned by these two techniques and 
adjust it if needed.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM, Research and Education Coordinator, Wikimedia TN User Group
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
____________________
+21629499418


-------- Message d'origine --------
De : Thomas Douillard <[email protected]>
Date : 2019/09/20 12:08 (GMT+01:00)
À : "Discussion list for the Wikidata project." <[email protected]>
Objet : [Wikidata] Lexical datas and automated learning – where it is answered 
to « I don’t believe in Wikidata senses developpment »


I recently read the french sentence « Je ne crois pas au développement des 
sens. » — translation : I don’t believes senses with develop much (following 
links in a Wikidata Weekly summary, the slides on a french meeting about 
Wikidata lexicographical datas). I believe in it, (regardless of the arguments 
exposed in the slides), and I write this email to try to explain why.

I’m curious to know if there is already some work on the automated discovering 
of lexicographical datas / senses thanks to the help of Wikidata items.

There is tools for automated tagging of terms with the corresponding Wikidata 
item, that appeared on this mailing list and/or on the wikidata weekly 
summaries.
There is also methods that can discover senses into texts using only the terms 
with no reference to any external « sense » like 
https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a209c1d3e12c
 and can discriminate several usages of the same word according to the context.

Wikidata lexicographical datas and Wikibase items could close the loop between 
the 2 methods and allow us to semi automatically build tools that annotate 
texts with Wikidata items it there is something relevant in Wikidata, but if 
there is nono try to suggest to add datas on Wikidata, wether it’s a missing 
item or a missing sense for the term.

It may even be possible to store word embeddings generated by word2vec methods 
into Wikidata senses.

In conclusion, I think Wikidata senses will be used because they allow to close 
a gap. It does not depends only on a strong involvement in a volunteer 
traditional lexicographic community. If reasearchers of the language community 
dives into this and develop algorithms and easy to use tools to share there 
lexicographical datas in Wikidata, there could be a very positive feedback loop 
where numerous data ends to be added on Wikidata, where the store datas helps 
the algorithm to enrich text annotations, for example, and missing datas are 
semi automatically added thanks to user feedback.

This is all just wishful thinking, but I thought this deserved to be shared, 
hopefully this will launch at list a thread of ideas/comment in here :)

Thomas
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to