Thank you Lucie, for taking the effort to answer in detail. As I said, I am afraid I cannot really understand your paper as I come from the humanities. And of course, a study about reader expectations was not part of your paper and research. For me personally, I would start there, and I know that Wikipedia research had always more attention for contributors than for readers.
You are opening a new issue actually: what is useful for readers, that is one thing. The other thing is: does an ArticlePlaceholder help an editor to improve an article. I would suppose that it is best to start the article on your own, but that may depend on the topic of the article. I do speak Esperanto, by chance. :-) https://eo.wikipedia.org/wiki/Uzanto:Ziko Kind regards, Ziko Lucie-Aimée Kaffee <kaf...@soton.ac.uk> schrieb am Sa. 7. Apr. 2018 um 16:24: > Hello Ziko, > > Thanks for your mail! I responded inline below. > > On 6 April 2018 at 03:04, Ziko van Dijk <zvand...@gmail.com> wrote: > > > Hello, > > > > A most interesting thread, as it touches the topic from different > angles. I > > agree that it needs actually a study among readers about their > preferences. > > > As I mentioned to Leila, the ESWC paper does work with editors, but I > agree, more thought and work should be done on actual Wikipedia readers. > > > > > Personally, I may have some doubt whether it improves an > ArticlePlaceholder > > to create sentences from the data (as they did in the geographical > > "articles" created by bots). The data itself is most suitable for > > databases, to be looked up in a table. Reading "Berlin has 3,500,000 > > million inhabitants" is not really an improvement compared to "Berlin / > > inhabitants: 3,500,000". > > > > Sentences have the most power when they combine information to knowledge, > > like in "Berlin's population, currently 3,500,000, has been much > different > > during the Cold War because of the declining attractiveness for > > businesses". > > > > In general, I would advise against one-sentence-summaries; a reader might > > be disappointed when he comes via Google to a website and then only finds > > one sentence. > > > > Just to clarify: the summaries do generate information from multiple > triples. Basically means, the sentences are a bit more complex than just > verbalizing one triple per sentence. However, even with a neural network, > there is a limit to how much context we can produce for each sentence. > Therefore, we integrated the question of how editors work with the data, as > we see it an important aspect of the workflow. Basically, > ArticlePlaceholder can be a better option than no information at all, but > still the ideal would be an actual editor picking up a topic and writing > and maintaining a full article. > Furthermore, in our current (theoretical) design we still keep all the > information available from Wikidata in forms of triples. Therefore, we > don't replace any information, we just add a sentence that's more reader > friendly and gives a first overview, before looking at pure triples. > > > > > (I hope I understood the question well; I cannot follow the math in your > > article. Is there anywhere an example of your "summaries" to read?) > > > The summaries are learned from the first sentence of Wikipedia, therefore > they contain the same kind of structure and content. If you're able to read > Arabic or Esperanto, generated sentences can be found here: > > https://github.com/pvougiou/Mind-the-Language-Gap/tree/master/Results/Our%20Model > > Cheers, > Lucie > > > > > > > > > > > > > > > 2018-04-05 22:50 GMT+02:00 Leila Zia <le...@wikimedia.org>: > > > > > Hi Lucie-Aimée, > > > > > > Nice to see work in this direction is progressing. Some comments > in-line. > > > > > > On Wed, Apr 4, 2018 at 7:49 AM, Lucie-Aimée Kaffee <kaf...@soton.ac.uk > > > > > wrote: > > > > > > > > Therefore, we worked on producing sentences from the information on > > > > Wikidata in the given language. We trained a neural network model, > the > > > > details can be found in the preprint of the NAACL paper here: > > > > https://arxiv.org/abs/1803.07116 > > > > > > It would be good to do human (both readers and editors, and perhaps > > > both sets) evaluations for this research, too, to better understand > > > how well the model is doing from the perspective of the experienced > > > editors in some of the smaller languages as well as their readers. (I > > > acknowledge that finding experienced editors when you go to small > > > languages can become hard.) > > > > > > > Furthermore, we would love to hear your input: Do you believe, one > > > sentence > > > > summaries are enough, can we serve the communities needs better with > > more > > > > than one sentence? > > > > > > This is a hard question to answer. :) The answer may rely on many > > > factors including the language you want to implement such a system in > > > and the expectation the users of the language have in terms of online > > > content available to them in their language. > > > > > > > Is this still true if longer abstracts would be of lower > > > > text quality? > > > > > > same as above. You are signing yourself up for more experiments. ;) > > > > > > I would be interested to know: > > > * What is the perception of the readers of a given language about > > > Wikipedia if a lot of articles that they go to in their language have > > > one sentence (to a good extent accurate), a few sentences but with > > > some errors, more sentences with more errors, versus not finding the > > > article they're interested in at all? > > > * Related to the above: what is the error threshold beyond which the > > > brand perceptions will turn negative (to be defined: may be by > > > measuring if the user returns in the coming week or month.)? This may > > > well be different in different languages and cultures. > > > * Depending on the result of the above, we may want to look at > > > offering the user the option to access that information, but outside > > > of Wikipedia, or inside Wikipedia but very clearly labeled as Machine > > > Generated as you do to some extent in these projects. > > > > > > > What other interesting use cases for such a technology in the > > > > Wikimedia world can you imagine? > > > > > > The technology itself can have a variety of use-cases, including > > > providing captions or summaries of photos even without layers of image > > > processing applied to them. > > > > > > Best, > > > Leila > > > > > > > [1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder and > > > > https://commons.wikimedia.org/wiki/File:Generating_Article_ > > > Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_ > > > Access_to_Free_and_Open_Knowledge.pdf > > > > [2] > > > > https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_ > > > Wikidata_Multilingual.pdf > > > > > > > > -- > > > > Lucie-Aimée Kaffee > > > > Web and Internet Science Group > > > > School of Electronics and Computer Science > > > > University of Southampton > > > > _______________________________________________ > > > > Wiki-research-l mailing list > > > > Wiki-research-l@lists.wikimedia.org > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > _______________________________________________ > > > Wiki-research-l mailing list > > > Wiki-research-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > _______________________________________________ > > Wiki-research-l mailing list > > Wiki-research-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > -- > Lucie-Aimée Kaffee > Web and Internet Science Group > School of Electronics and Computer Science > University of Southampton > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l