Re: [Wikimedia-l] Wikidata Stubs: Threat or Menace?

Jane Darnell Thu, 25 Apr 2013 23:52:32 -0700

Perhaps you could just imitate the QRpedia model, which says, this
article is not available in your default language, and serve up links
to the languages it *IS* available in. After all, presence on Wikidata
means presence on *at least one Wikipedia*, if I'm not mistaken.


2013/4/25, Erik Moeller <e...@wikimedia.org>:
> Millions of Wikidata stubs invade small Wikipedias .. Volapük
> Wikipedia now best curated source on asteroids .. new editors flood
> small wikis .. Google spokesperson: "This is out of control. We will
> shut it down."
>
> Denny suggested:
>
>>> II ) develop a feature that blends into Wikipedia's search if an article
>>> about a topic does not exist yet, but we  have data on Wikidata about
>>> that
>>> topic
>
> Andrew Gray responded:
>
>> I think this would be amazing. A software hook that says "we know X
>> article does not exist yet, but it is matched to Y topic on Wikidata"
>> and pulls out core information, along with a set of localised
>> descriptions... we gain all the benefit of having stub articles
>> (scope, coverage) without the problems of a small community having to
>> curate a million pages. It's not the same as hand-written content, but
>> it's immeasurably better than no content, or even an attempt at
>> machine-translating free text.
>>
>> XXX is [a species of: fish] [in the: Y family]. It [is found in: Laos,
>> Vietnam]. It [grows to: 20 cm]. (pictures)
>
> This seems very doable. Is it desirable?
>
> For many languages, it would allow hundreds of thousands of
> pseudo-stubs (not real articles stored in the DB, but generated from
> Wikidata) to be served to readers and crawlers that would otherwise
> not exist in that language.
>
> Looking back 10 years, User:Ram-Man was one of the first to generate
> thousands of en.wp articles from, in this case, US census data. It was
> controversial at the time and it stuck. Other Wikipedias have since
> then either allowed or prohibited bot-creation of articles on a
> project-by-project basis. It tends to lead to frustration when folks
> compare article counts and see artificial inflation by bot-created
> content.
>
> Does anyone know if the impact of bot-creation on (new) editor
> behavior has been studied? I do know that many of the Rambot articles
> were expanded over time, and I suspect many wouldn't have been if they
> hadn't turned up in search engines in the first place. On the flip
> side, a large "surface area" of content being indexed by search
> engines will likely also attract a fair bit of drive-by vandalism that
> may not be detected because those pages aren't watched.
>
> A model like the proposed one might offer a solution to a lot of these
> challenges. How I imagine it could work:
>
> * Templates could be defined for different Wikidata entities. We could
> make it possible to let users add links from items in Wikidata to
> Wikipedia articles that don't exist yet. (Currently this is
> prohibited.) If such a link is added, _and_ a relevant template is
> defined for the Wikidata entity type (perhaps through an entity
> type->template mapping), WP will render an article using that
> template, pulling structured info from Wikidata.
>
> * A lot of the grammatical rules would be defined in the template
> using checks against the Wikidata result. Depending on the complexity
> of grammatical variations beyond basics such as singular/plural this
> might require Lua scripting.
>
> * The article is served as a normal HTTP 200 result, cached, and
> indexed by search engines. In WP itself, links to the article might
> have some special affordance that suggests that they're neither
> ordinary red links nor existing articles.
>
> * When a user tries to edit the article, wikitext (or visual edit
> mode) is generated, allowing the user to expand or add to the
> automatically generated prose and headings. Such edits are tagged so
> they can more easily be monitored (they could also be gated by default
> if the vandalism rate is too high).
>
> * We'd need to decide whether we want these pages to show up in
> searches on WP itself.
>
> Advantages:
>
> * These pages wouldn't inflate page counts, but they would offer
> useful information to readers and be higher quality than machine
> translation.
>
> * They could serve as powerful lures for new editors in languages that
> are currently underrepresented on the web.
>
> Disadvantages/concerns:
>
> * Depending on implementation, I continue to have some concern about
> {{#property}} references ending up in article text (as opposed to
> templates); these concerns are consistent with the ones expressed in
> the en.wp RFC [1]. This might be mitigated if Visual Editor offers a
> super-intuitive in-place editing method. {{#property}} references in
> text could also be converted to their plain text representation the
> moment a page is edited by a human being (which would have its own set
> of challenges, of course).
>
> * How massive would these sets of auto-generated articles get? I
> suspect the technical complexity of setting up the templates and
> adding the links in Wikidata itself would act as a bit of a barrier to
> entry. But vast pseudo-article sets in tiny languages could pose
> operational challenges without adding a lot of value.
>
> * Would search engines penalize WP for such auto-generated content?
>
> Overall, I think it's an area where experimentation is merited, as it
> could not only expand information in languages that are
> underrepresented on the web, but also act as a force multiplier for
> new editor entrypoints. It also seems that a proof-of-concept for
> experimentation in a limited context should be very doable.
>
> Erik
>
> [1]
> https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Wikidata_Phase_2#Use_of_Wikidata_in_article_text
> --
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

Re: [Wikimedia-l] Wikidata Stubs: Threat or Menace?

Reply via email to