On em.wiki article importance is relative to some wikiproject. This is
encoded in https://en.wikipedia.org/wiki/Template:WPBannerMeta which
appears on 16% of all wikipedia pages via specialisations such as
https://en.wikipedia.org/wiki/Template:WikiProject_New_Zealand

Within Wikiproject New Zealand, there are articles which we think are very
important to us, which we would never argue are even marginally important
on a global scale. Take for example
https://en.wikipedia.org/wiki/Pavlova_(food)

For the mathematically inclined, this is a classic case of graph and many
subgraphs.

cheers
stuart


--
...let us be heard from red core to black sky

On 27 April 2017 at 21:44, Gerard Meijssen <[email protected]>
wrote:

> Hoi,
> I have read the proposal and it leaves me wondering. Also the notion of
> importance is indeed neither easy nor obvious. I think the question what is
> most important is irrelevant depending on how you look at it. Subject can
> be irrelevant when you look at it from a personal perspective, looking at
> it from a particular perspective and indeed what seems relevant may become
> irrelevant or relevant over time. When you use metrics there will always be
> one way or another why it will be found to be problematic.
>
> When you consider Wikipedia, the difference it makes with similar resources
> is that its long tail is so much longer and still it is easy and obvious to
> show how the English Wikipedia's long tail is not long enough [1]. When you
> are looking for links and relevance, Wikidata includes data on all
> Wikipedias and thereby more avenues to establish relevance.
>
> Research has been done that shows that when people are suggested to write
> articles or amend articles, it works best when it is about subjects they
> care about. What people are interested in was based in the research on past
> behaviour. What we could do is flip this and ask people. Based on
> categories, on projects, whatever people do to categorise what is their
> interest. This will work on a micro level. On a meta level, it may drive
> cooperation when we enable people to share their interest (at that moment
> in time). On a macro level data may arrive at Wikidata and this will allow
> us to seek what articles include specific data (think date of death for
> instance). On a meta and macro level, we could ask readers what subjects
> they are missing. This would provide an additional incentive for people to
> write. For this last suggestion we could measure what people are missing.
>
> Anyway, relevance and importance depend on a point of view. When our
> community is enabled to make a difference, it will help us with our
> content. As a movement we know that there is enough that we do not properly
> cover. Advocating these issues and targeting and educating potential
> communities is where the WMF could play more of a role.
> Thanks,
>        GerardM
>
>
>
> [1]
> http://ultimategerardm.blogspot.nl/2017/04/wikidata-
> user-stories-sum-of-all.html
>
> On 26 April 2017 at 13:48, Jonathan Cardy <[email protected]>
> wrote:
>
> > I like to think that in time importance will win out over popularity. If
> > Wikipedia still exists in fifty of five hundred years time and we are
> still
> > using pasteurisation and indeed still eating hydrocarbon based foods,
> then
> > I suspect the pop group you mention will be less frequently read about
> than
> > the pasteurisation process.
> >
> > In the meantime if we try to work it out at all it has to be something of
> > a judgement call, and one we will occasionally get wrong. Any guesses as
> to
> > which current branches of science will be as forgotten in a century as
> > phrenology is today?
> >
> > At an extreme the weekly top ten most viewed articles are a good guide to
> > what is trending in the popular cultures of India and the USA. I'm
> assuming
> > that most modern pop culture is inherently ephemeral. Of course digital
> > historians of future centuries may be rolling on the floor laughing at
> this
> > email, and the TV dramas currently being filmed may still be widely
> studied
> > and universally known classics while our leading edge science lies buried
> > in the foundations of their science.
> >
> > Regards
> >
> > Jonathan
> >
> >
> > > On 26 Apr 2017, at 08:50, Jane Darnell <[email protected]> wrote:
> > >
> > > Yes I totally agree that "importance is a relative metric rather than
> > > absolute." I also agree that incoming links and pageviews are not
> > accurate
> > > measurements of "importance" for all of the reasons you mention.
> However,
> > > we are still a project that is actively exploring the universe of
> > > knowledge, and leaning heavily on academia and other established
> sources
> > we
> > > must "boldly go where no man has gone before" (and please feel free to
> > > insert "white, euro-centric" before the man part). So do you have any
> > > suggestions what we could measure going forward that would cough up
> some
> > > interesting stats to monitor? Pagewatching is useful , but problematic
> > > because these are only assigned at page-creation, while some marginal
> > > editor interest might be expanded to whole categories (speaking as
> > someone
> > > who has thousands of pages watchlisted on multiple projects). I like
> your
> > > thoughts about looking for key articles such as those used as the
> > "article
> > > as the "main" article for a category or as the title of a navbox ".  I
> am
> > > looking for similar usages of paintings as a way to find popular
> painters
> > > or paintings rather than just those paintings which have articles
> written
> > > about them (which are often written for totally random reasons such as
> > > theft/sale/wikiproject).
> > >
> > > On Wed, Apr 26, 2017 at 5:39 AM, Kerry Raymond <
> [email protected]>
> > > wrote:
> > >
> > >> Just a few musings on the issue of Importance and how to research it
> ...
> > >>
> > >> I agree it is intuitive that importance is likely to be linked to
> > >> pageviews and inbound links but, as the preliminary experiment showed,
> > it's
> > >> probably not that simple.
> > >>
> > >> Pageviews tells us something about importance to readers of Wikipedia,
> > >> while inbound links tells us something about importance to writers of
> > >> Wikipedia, and I suspect that writers are not a proxy for readers as
> the
> > >> editor surveys suggest that Wikipedia writers are not typical of
> broader
> > >> society on at least two variables: gender and level of education
> (might
> > be
> > >> others, I can't remember).
> > >>
> > >> But I think importance is a relative metric rather than  absolute. I
> > think
> > >> by taking the mean value of importance across a number of WikiProjects
> > in
> > >> the preliminary experiment may have lost something because it tried
> > >> (through averaging) to look at importance "generally". I would suspect
> > >> conducting an experiment considering only the importance ratings wrt
> to
> > a
> > >> single WikiProject would be more likely to show correlation with
> > pageviews
> > >> (wrt to other articles in that same WikiProject) and inbound links.
> And
> > I
> > >> think there are two kinds of inbound links to be considered, those
> > coming
> > >> from other articles within the same WikiProject and those coming from
> > >> outside that Wikiproject. I suspect different insights will be
> obtained
> > by
> > >> looking at both types of inbound links separately rather than treating
> > them
> > >> as an aggregate. I note also that WikiProjects are not entirely
> > independent
> > >> of one another but have relationships between them. For example, The
> > >> WikiProject Australian Roads describes itself as an "intersection" (ha
> > ha!)
> > >> of WikiProject Highways and WikiProject Australia, so I expect that we
> > >> would find greater correlation in importance between related
> > WikiProjects
> > >> than between unrelated WikiProjects.
> > >>
> > >> When thinking about readers and pageviews, I think we have to ask
> > >> ourselves is there a difference between popularity and importance. Or
> > >> whether popularity *is* importance. I sense that, as a group of
> educated
> > >> people, those of us reading this research mailing list probably do
> think
> > >> there is a difference. Certainly if there is no difference, then this
> > >> research can stop now -- just judge importance by  pageviews. Let's
> > assume
> > >> a difference then. When looking at pageviews of an article, they are
> not
> > >> always consistent over time. Here are the pageviews for Drottninggatan
> > >>
> > >> https://tools.wmflabs.org/pageviews/?project=en.
> > >> wikipedia.org&platform=all-access&agent=user&range=
> > >> latest-90&pages=Drottninggatan
> > >>
> > >> Why so interesting on 8 April? A terrorist attack occurred there. This
> > >> spike in pageviews occurs all the time when some topic is in the news
> > (even
> > >> peripherally as in this case where it is not the article about the
> > >> terrorist attack but about the street in which it occurred). Did the
> > street
> > >> become more "important"? I think it became more interesting but not
> more
> > >> important. So I think we do have to be careful to understand that
> > pageviews
> > >> probably reflect interest rather than importance.  I note that The
> > >> Chainsmokers (a music group with a number of songs in the current USA
> > music
> > >> charts) gets many more Wikipedia article pageviews  than the Wikipedia
> > >> article on Pasteurization but The Chainsmokers are not rated as being
> of
> > >> high importance by the relevant WikiProjects while Pasteurization is
> > very
> > >> important in WikiProject Food and Drink. Since pasteurisation
> prevents a
> > >> lot of deaths, I think we might agree that in the real world
> > pasteurisation
> > >> is more important than a music group regardless of what pageviews tell
> > us.
> > >>
> > >> https://tools.wmflabs.org/pageviews/?project=en.
> > >> wikipedia.org&platform=all-access&agent=user&range=
> latest-90&pages=The_
> > >> Chainsmokers|Pasteurization
> > >>
> > >> Of course it is matters for Wikipedia's success that our *popular*
> > >> articles are of high quality, but I think we have be cautious about
> > >> pageviews being a proxy for importance.
> > >>
> > >> When we look at Wikipedia writers' decisions in tagging the importance
> > of
> > >> articles to WikiProjects, what do we find? As we know, project tags
> are
> > >> often placed on new articles (and often not subsequently reviewed). So
> > >> while I find that quality tags are often out-of-date, the importance
> > seems
> > >> to be pretty accurate even on a new stub articles. This is because it
> is
> > >> the importance of the *topic* that is being assessed which is
> > independent
> > >> of the Wikipedia article itself. Provided the article is clear enough
> > about
> > >> what it is about and why it matters (which is the traditional content
> of
> > >> that first paragraph or two and failing to provide it will likely
> > result in
> > >> speedy deletion of the new article), assessment of the topic's
> > importance
> > >> can be made even at new stub level. This tells us that importance for
> > >> Wikipedia writers is determined by something outside of Wikipedia
> > (probably
> > >> their real-world knowledge of that topic space -- one assumes that
> > project
> > >> taggers are quite interested in the topic space of that project).
> While
> > >> article quality hopefully improves over time, I would be surprised if
> > >> article importance greatly changed over time. Obviously there are
> > >> counter-examples.  I am guessing Donald Trump's article may have grown
> > in
> > >> importance over time but that's probably because his lede para
> changed.
> > >> Adding President of the USA into the lede paragraph makes him much
> more
> > >> important than he was before in the real world and internal to
> > Wikipedia he
> > >> has acquired an inbound link from the presumably high-importance
> > President
> > >> of the USA article. So I think it might be interesting to study those
> > >> articles whose importance does change over time to see if there are
> any
> > >> strong correlations with what is happening to the article inside
> > Wikipedia.
> > >> I think it is this set of importance-changing articles may be where we
> > >> really learn what Wikipedia article characteristics are strongly
> > correlated
> > >> to "importance" given that importance itself appears to be pretty
> stable
> > >> for most articles.
> > >>
> > >> Although not stated explicitly, I imagine we believe that generally
> less
> > >> important articles tend to link to more important articles but more
> > >> important articles don't link to less important articles. And hence
> > >> in-bound links are likely to matter in assessing importance and that
> > >> in-bound links from "important" articles are more valuable than
> in-bound
> > >> links from less important articles (which creates something of a
> > >> bootstrapping problem) similar to the issue to Google's PageRank
> > >> algorithms. But I think we do have some information that Google
> doesn't
> > >> have. The average webpage does not have a lede paragraph that situates
> > the
> > >> topic relative to other topics; a Wikipedia article does. If I have to
> > >> choose to define Thing X in terms of Thing Y, it tends to suggest that
> > Y is
> > >> more important than X. If Y also defines itself in terms of X, then it
> > >> tends to suggest they are equivalent in importance at some way.
> Indeed I
> > >> suspect when we get to the VERY IMPORTANT topics we will see this kind
> > of
> > >> circular definition (e.g. you see circular definitions in Wikipedia
> > around
> > >> Philosophy and Knowledge). Aside, if you have never done this before,
> > try
> > >> this experiment. Choose a random article (left hand tool bar in
> Desktop
> > >> Wikipedia), then click the first link in the article that matters
> (i.e.
> > >> ignore links hatnotes or links inside parentheses). Repeat this first
> > link
> > >> clicking and sooner or later you will reach articles like Knowledge
> and
> > >> Philosophy, which all sit inside circular definition groups.
> > >>
> > >> If we look at the Donald Trump article, his first sentence contains
> only
> > >> two links, one to List of Presidents of the USA and the other to
> > President
> > >> of the USA. If we look at the those two articles, we find that both of
> > them
> > >> mention Donald Trump in their lede paras (although not as early as the
> > >> first sentence) and before mentions of any other US President
> elsewhere
> > in
> > >> the article. Which is consistent with what we know about the real
> world,
> > >> the role of the President is more important than its officeholders and
> > that
> > >> the current officeholder has more importance than a past officeholder.
> > So
> > >> topic importance does seems to be skewed towards the "present day".
> > >>
> > >> So I suspect the links in the lede paras are of greater relevance to
> the
> > >> assessment of importance than links further down in the article which
> > will
> > >> be more likely relate to details of a topic and may include examples
> and
> > >> counter-examples (this is a way in which high importance article may
> > >> mention much lower importance articles). However, we do have to be a
> > little
> > >> bit careful here because of the MoS practice of not linking very
> common
> > >> terms. For example, an Australian article will often refer to
> Australia
> > in
> > >> the lede para but it will almost certainly not be linked to the
> > Australia
> > >> article (and any attempt to add such a link will likely see it removed
> > with
> > >> an edit summary that mentions [[WP:Overlinking]]) whereas there is no
> > >> problem if you link to an Australian state article, e.g. New South
> > Wales.
> > >> So we might find that some very important topics that often appear in
> > ledes
> > >> might get fewer links that you might expect because of the MoS
> policies
> > on
> > >> overlinking, which may be problem when working with inbound links. It
> > may
> > >> be that for "very common topics" the presence of the article title (or
> > its
> > >> synonyms) in the lede may have to be considered as if it were an
> > in-bound
> > >> link for statistical research purposes.
> > >>
> > >> Given all of the above, perhaps the most interesting group of articles
> > to
> > >> study in Wikipedia are those articles whose manually-assessed
> importance
> > >> has changed over the life of the article AND which were NOT current
> > topics
> > >> in the lifetime of Wikipedia (given the influence of "current" on
> > >> importance). But having said that, I wonder if that group of articles
> > >> actually exists. Recently a newish Australian contributor expressed
> > >> disappointment that all the new articles they had created were tagged
> > (by
> > >> others) as of Low Importance. My instinctive reply was "that's
> normal, I
> > >> think of the thousands of articles I have started only a couple even
> > rated
> > >> as Mid importance, this is because the really important articles were
> > all
> > >> started long ago precisely because they were important". I suspect
> > topics
> > >> that are very important (for reasons other than being short-lived
> > >> importance due in being "current" in the lifetime of Wikipedia) will
> > >> generally show up as having started early in Wikipedia's life and that
> > >> those that become more/less important over time will be largely linked
> > to
> > >> becoming or ceasing to be "current" topics). E.g. article
> Pasteurization
> > >> started in May 2001 saying nothing more than " Pasteurization is the
> > >> process of killing off bacteria in milk by quickly heating it to a
> near
> > >> boiling temperature, then quickly cooling it again before the taste
> and
> > >> other desirable properties are affected. The process was named after
> its
> > >> inventor, French scientist Louis Pasteur. See also dairy products."
> The
> > >> links in this very first version are still present in its lede
> paragraph
> > >> today, suggesting our understanding of "non-current" topics is stable
> > and
> > >> hence initial importance determinations can probably be accurately
> made.
> > >> For Pasteurization the Talk page shows it was not project-tagged until
> > 2007
> > >> when it was assigned High Importance as its first assessment.
> > >>
> > >> I suspect we will find that initial manual assessment of article
> > >> importance will be pretty accurate for most articles. And I suspect if
> > we
> > >> plot initial importance assessments against time of assessment, we
> will
> > >> find the higher importance articles commenced life on Wikipedia
> earlier
> > >> than the lower importance articles. If I am correct, then there isn't
> a
> > lot
> > >> of value in machine-assessment of importance of topics because it
> > relates
> > >> to factors external to Wikipedia and often does not change over time
> and
> > >> therefore can often be correctly assessed manually even on new stub
> > >> articles (and any unassessed articles can probably be rated as Low
> > >> Importance as statistically that's almost certainly going to be
> > correct).
> > >> If a topic becomes more important due to "current" events, then
> > invariably
> > >> that article will be updated by many people and one of them will
> sooner
> > or
> > >> later manually adjust its importance. What is less likely to happen is
> > >> re-assessing downwards of Importance when an important "current" topic
> > >> loses its importance when it is no longer current, e.g. are former
> > American
> > >> presidents like Barack Obama or George W Bush or further back less
> > >> important now? These articles will not be updated frequently once the
> > topic
> > >> is no longer in the news and therefore it is less likely an editor
> will
> > >> notice and manually downgrade the importance, so there may be a
> greater
> > >> role for machine-assessment in downgrading importance rather than
> > upgrading
> > >> importance.
> > >>
> > >> Another area where there might be a role for machine-assessed
> importance
> > >> in regards to POV-pushing where an POV-motivated editor might change
> the
> > >> manual-assessment importance of articles to be higher or lower based
> on
> > >> their POV (e.g. my political party is Top Importance, other parties
> are
> > of
> > >> Low Importance). I suspect that often a page watcher would correct or
> at
> > >> least question that kind of re-assessment. However, articles with few
> > >> active pagewatchers you might get away with POV-pushing the article's
> > >> importance tag because nobody noticed. In this situation, a machine
> > >> assessment could be useful in spotting this kind of thing.
> > >>
> > >> This suggests that another metric of interest to importance might be
> > >> number of pagewatchers, although I suspect that pagewatching may
> relate
> > >> more to caring about the article than to caring about the topic. And
> one
> > >> has to be careful to distinguish active pagewatchers (those who
> > actually do
> > >> review changes on their watchlists) from those who don't, as that may
> > make
> > >> a difference (although I am not sure we can really tell which
> > pagewatchers
> > >> are truly actively reviewing as a "satisfactory review" doesn't leave
> a
> > >> trace whereas an "unsatisfactory" review is likely to lead to a
> > relatively
> > >> soon revert or some other change to the article, the article Talk or
> the
> > >> User Talk of reviewed contributor which may be detectable).
> > >>
> > >> The other aspect of articles that occurs to me as being possibly
> linked
> > to
> > >> importance of the topic would be use of the article as the "main"
> > article
> > >> for a category or as the title of a navbox (as it suggests that the
> > >> articles in the category or navbox are in some way subordinate to the
> > >> main/title article). Similarly for list articles, the "type" of the
> > list is
> > >> often more important than its instances).
> > >>
> > >> Kerry
> > >>
> > >> -----Original Message-----
> > >> From: Wiki-research-l [mailto:wiki-research-l-
> > [email protected]]
> > >> On Behalf Of Morten Wang
> > >> Sent: Friday, 21 April 2017 6:04 AM
> > >> To: Research into Wikimedia content and communities <
> > >> [email protected]>
> > >> Subject: Re: [Wiki-research-l] Project exploring automated
> > classification
> > >> of article importance
> > >>
> > >> Hi Pine,
> > >>
> > >> These are great pointers to existing practices on enwiki, some of
> which
> > >> I've been looking for and/or missed, thanks!
> > >>
> > >>
> > >> Cheers,
> > >> Morten
> > >>
> > >>> On 19 April 2017 at 22:35, Pine W <[email protected]> wrote:
> > >>>
> > >>> Hi Nettrom,
> > >>>
> > >>> A few resources from English Wikipedia regarding article importance
> as
> > >>> ranked by humans:
> > >>>
> > >>> https://en.wikipedia.org/wiki/Wikipedia:Vital_articles
> > >>>
> > >>> https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_
> > >>> Editorial_Team/Release_Version_Criteria#Priority_of_topic
> > >>>
> > >>> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_
> assessment#Statist
> > >>> ics
> > >>>
> > >>> I infer from the ENWP Wikicup's scoring protocol that for purposes of
> > >>> the competition, an article's "importance" is loosely inferred from
> > >>> the number of language editions of Wikipedia in which the article
> > >> appears:
> > >>> https://en.wikipedia.org/wiki/Wikipedia:WikiCup/Scoring#Bonus_points
> .
> > >>>
> > >>> HTH,
> > >>>
> > >>> Pine
> > >>>
> > >>>
> > >>>> On Tue, Apr 18, 2017 at 4:17 PM, Morten Wang <[email protected]>
> > wrote:
> > >>>>
> > >>>> Hello everyone,
> > >>>>
> > >>>> I am currently working with Aaron Halfaker and Dario Taraborelli at
> > >>>> the Wikimedia Foundation on a project exploring automated
> > >>>> classification of article importance. Our goal is to characterize
> > >>>> the importance of an article within a given context and design a
> > >>>> system to predict a relative importance rank. We have a project page
> > >>>> on meta[1] and welcome comments
> > >>> or
> > >>>> thoughts on our talk page. You can of course also respond here on
> > >>>> wiki-research-l, or send me an email.
> > >>>>
> > >>>> Before moving on to model-building I did a fairly thorough
> > >>>> literature review, finding a myriad of papers spanning several
> > >>>> disciplines. We have
> > >>> a
> > >>>> draft literature review also up on meta[2], which should give you a
> > >>>> reasonable introduction to the topic. Again, comments or thoughts
> > (e.g.
> > >>>> papers we’ve missed) on the talk page, mailing list, or through
> > >>>> email are welcome.
> > >>>>
> > >>>> Links:
> > >>>>
> > >>>>   1. https://meta.wikimedia.org/wiki/Research:Automated_
> > >>>>   classification_of_article_importance
> > >>>>   <https://meta.wikimedia.org/wiki/Research:Automated_
> > >>>> classification_of_article_importance>
> > >>>>   2. https://meta.wikimedia.org/wiki/Research:Studies_of_Importance
> > >>>>
> > >>>> Regards,
> > >>>> Morten
> > >>>> [[User:Nettrom]] aka [[User:SuggestBot]]
> > >>>> _______________________________________________
> > >>>> Wiki-research-l mailing list
> > >>>> [email protected]
> > >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >>> _______________________________________________
> > >>> Wiki-research-l mailing list
> > >>> [email protected]
> > >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >> _______________________________________________
> > >> Wiki-research-l mailing list
> > >> [email protected]
> > >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >>
> > >>
> > >> _______________________________________________
> > >> Wiki-research-l mailing list
> > >> [email protected]
> > >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [email protected]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to