Re: [Wiki-research-l] Generation of Wikipedia Summaries from Wikidata in Underserved Languages using Deep Learning

Ziko van Dijk Sun, 08 Apr 2018 07:45:38 -0700

Thank you Lucie, for taking the effort to answer in detail. As I said, I am
afraid I cannot really understand your paper as I come from the humanities.
And of course, a study about reader expectations was not part of your paper
and research. For me personally, I would start there, and I know that
Wikipedia research had always more attention for contributors than for
readers.


You are opening a new issue actually: what is useful for readers, that is
one thing. The other thing is: does an ArticlePlaceholder help an editor to
improve an article. I would suppose that it is best to start the article on
your own, but that may depend on the topic of the article.

I do speak Esperanto, by chance. :-)
https://eo.wikipedia.org/wiki/Uzanto:Ziko

Kind regards,
Ziko

Lucie-Aimée Kaffee <kaf...@soton.ac.uk> schrieb am Sa. 7. Apr. 2018 um
16:24:

> Hello Ziko,
>
> Thanks for your mail! I responded inline below.
>
> On 6 April 2018 at 03:04, Ziko van Dijk <zvand...@gmail.com> wrote:
>
> > Hello,
> >
> > A most interesting thread, as it touches the topic from different
> angles. I
> > agree that it needs actually a study among readers about their
> preferences.
> >
> As I mentioned to Leila, the ESWC paper does work with editors, but I
> agree, more thought and work should be done on actual Wikipedia readers.
>
> >
> > Personally, I may have some doubt whether it improves an
> ArticlePlaceholder
> > to create sentences from the data (as they did in the geographical
> > "articles" created by bots). The data itself is most suitable for
> > databases, to be looked up in a table. Reading "Berlin has 3,500,000
> > million inhabitants" is not really an improvement compared to "Berlin /
> > inhabitants: 3,500,000".
> >
> > Sentences have the most power when they combine information to knowledge,
> > like in "Berlin's population, currently 3,500,000, has been much
> different
> > during the Cold War because of the declining attractiveness for
> > businesses".
> >
> > In general, I would advise against one-sentence-summaries; a reader might
> > be disappointed when he comes via Google to a website and then only finds
> > one sentence.
> >
>
> Just to clarify: the summaries do generate information from multiple
> triples. Basically means, the sentences are a bit more complex than just
> verbalizing one triple per sentence. However, even with a neural network,
> there is a limit to how much context we can produce for each sentence.
> Therefore, we integrated the question of how editors work with the data, as
> we see it an important aspect of the workflow. Basically,
> ArticlePlaceholder can be a better option than no information at all, but
> still the ideal would be an actual editor picking up a topic and writing
> and maintaining a full article.
> Furthermore, in our current (theoretical) design we still keep all the
> information available from Wikidata in forms of triples. Therefore, we
> don't replace any information, we just add a sentence that's more reader
> friendly and gives a first overview, before looking at pure triples.
>
> >
> > (I hope I understood the question well; I cannot follow the math in your
> > article. Is there anywhere an example of your "summaries" to read?)
> >
> The summaries are learned from the first sentence of Wikipedia, therefore
> they contain the same kind of structure and content. If you're able to read
> Arabic or Esperanto, generated sentences can be found here:
>
> https://github.com/pvougiou/Mind-the-Language-Gap/tree/master/Results/Our%20Model
>
> Cheers,
> Lucie
>
> >
> >
> >
> >
> >
> >
> > 2018-04-05 22:50 GMT+02:00 Leila Zia <le...@wikimedia.org>:
> >
> > > Hi Lucie-Aimée,
> > >
> > > Nice to see work in this direction is progressing. Some comments
> in-line.
> > >
> > > On Wed, Apr 4, 2018 at 7:49 AM, Lucie-Aimée Kaffee <kaf...@soton.ac.uk
> >
> > > wrote:
> > > >
> > > > Therefore, we worked on producing sentences from the information on
> > > > Wikidata in the given language. We trained a neural network model,
> the
> > > > details can be found in the preprint of the NAACL paper here:
> > > > https://arxiv.org/abs/1803.07116
> > >
> > > It would be good to do human (both readers and editors, and perhaps
> > > both sets) evaluations for this research, too, to better understand
> > > how well the model is doing from the perspective of the experienced
> > > editors in some of the smaller languages as well as their readers. (I
> > > acknowledge that finding experienced editors when you go to small
> > > languages can become hard.)
> > >
> > > > Furthermore, we would love to hear your input: Do you believe, one
> > > sentence
> > > > summaries are enough, can we serve the communities needs better with
> > more
> > > > than one sentence?
> > >
> > > This is a hard question to answer. :) The answer may rely on many
> > > factors including the language you want to implement such a system in
> > > and the expectation the users of the language have in terms of online
> > > content available to them in their language.
> > >
> > > > Is this still true if longer abstracts would be of lower
> > > > text quality?
> > >
> > > same as above. You are signing yourself up for more experiments. ;)
> > >
> > > I would be interested to know:
> > > * What is the perception of the readers of a given language about
> > > Wikipedia if a lot of articles that they go to in their language have
> > > one sentence (to a good extent accurate), a few sentences but with
> > > some errors, more sentences with more errors, versus not finding the
> > > article they're interested in at all?
> > > * Related to the above: what is the error threshold beyond which the
> > > brand perceptions will turn negative (to be defined: may be by
> > > measuring if the user returns in the coming week or month.)? This may
> > > well be different in different languages and cultures.
> > > * Depending on the result of the above, we may want to look at
> > > offering the user the option to access that information, but outside
> > > of Wikipedia, or inside Wikipedia but very clearly labeled as Machine
> > > Generated as you do to some extent in these projects.
> > >
> > > > What other interesting use cases for such a technology in the
> > > > Wikimedia world can you imagine?
> > >
> > > The technology itself can have a variety of use-cases, including
> > > providing captions or summaries of photos even without layers of image
> > > processing applied to them.
> > >
> > > Best,
> > > Leila
> > >
> > > > [1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder and
> > > > https://commons.wikimedia.org/wiki/File:Generating_Article_
> > > Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_
> > > Access_to_Free_and_Open_Knowledge.pdf
> > > > [2]
> > > > https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_
> > > Wikidata_Multilingual.pdf
> > > >
> > > > --
> > > > Lucie-Aimée Kaffee
> > > > Web and Internet Science Group
> > > > School of Electronics and Computer Science
> > > > University of Southampton
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
>
> --
> Lucie-Aimée Kaffee
> Web and Internet Science Group
> School of Electronics and Computer Science
> University of Southampton
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] Generation of Wikipedia Summaries from Wikidata in Underserved Languages using Deep Learning

Reply via email to