On em.wiki article importance is relative to some wikiproject. This is encoded in https://en.wikipedia.org/wiki/Template:WPBannerMeta which appears on 16% of all wikipedia pages via specialisations such as https://en.wikipedia.org/wiki/Template:WikiProject_New_Zealand
Within Wikiproject New Zealand, there are articles which we think are very important to us, which we would never argue are even marginally important on a global scale. Take for example https://en.wikipedia.org/wiki/Pavlova_(food) For the mathematically inclined, this is a classic case of graph and many subgraphs. cheers stuart -- ...let us be heard from red core to black sky On 27 April 2017 at 21:44, Gerard Meijssen <[email protected]> wrote: > Hoi, > I have read the proposal and it leaves me wondering. Also the notion of > importance is indeed neither easy nor obvious. I think the question what is > most important is irrelevant depending on how you look at it. Subject can > be irrelevant when you look at it from a personal perspective, looking at > it from a particular perspective and indeed what seems relevant may become > irrelevant or relevant over time. When you use metrics there will always be > one way or another why it will be found to be problematic. > > When you consider Wikipedia, the difference it makes with similar resources > is that its long tail is so much longer and still it is easy and obvious to > show how the English Wikipedia's long tail is not long enough [1]. When you > are looking for links and relevance, Wikidata includes data on all > Wikipedias and thereby more avenues to establish relevance. > > Research has been done that shows that when people are suggested to write > articles or amend articles, it works best when it is about subjects they > care about. What people are interested in was based in the research on past > behaviour. What we could do is flip this and ask people. Based on > categories, on projects, whatever people do to categorise what is their > interest. This will work on a micro level. On a meta level, it may drive > cooperation when we enable people to share their interest (at that moment > in time). On a macro level data may arrive at Wikidata and this will allow > us to seek what articles include specific data (think date of death for > instance). On a meta and macro level, we could ask readers what subjects > they are missing. This would provide an additional incentive for people to > write. For this last suggestion we could measure what people are missing. > > Anyway, relevance and importance depend on a point of view. When our > community is enabled to make a difference, it will help us with our > content. As a movement we know that there is enough that we do not properly > cover. Advocating these issues and targeting and educating potential > communities is where the WMF could play more of a role. > Thanks, > GerardM > > > > [1] > http://ultimategerardm.blogspot.nl/2017/04/wikidata- > user-stories-sum-of-all.html > > On 26 April 2017 at 13:48, Jonathan Cardy <[email protected]> > wrote: > > > I like to think that in time importance will win out over popularity. If > > Wikipedia still exists in fifty of five hundred years time and we are > still > > using pasteurisation and indeed still eating hydrocarbon based foods, > then > > I suspect the pop group you mention will be less frequently read about > than > > the pasteurisation process. > > > > In the meantime if we try to work it out at all it has to be something of > > a judgement call, and one we will occasionally get wrong. Any guesses as > to > > which current branches of science will be as forgotten in a century as > > phrenology is today? > > > > At an extreme the weekly top ten most viewed articles are a good guide to > > what is trending in the popular cultures of India and the USA. I'm > assuming > > that most modern pop culture is inherently ephemeral. Of course digital > > historians of future centuries may be rolling on the floor laughing at > this > > email, and the TV dramas currently being filmed may still be widely > studied > > and universally known classics while our leading edge science lies buried > > in the foundations of their science. > > > > Regards > > > > Jonathan > > > > > > > On 26 Apr 2017, at 08:50, Jane Darnell <[email protected]> wrote: > > > > > > Yes I totally agree that "importance is a relative metric rather than > > > absolute." I also agree that incoming links and pageviews are not > > accurate > > > measurements of "importance" for all of the reasons you mention. > However, > > > we are still a project that is actively exploring the universe of > > > knowledge, and leaning heavily on academia and other established > sources > > we > > > must "boldly go where no man has gone before" (and please feel free to > > > insert "white, euro-centric" before the man part). So do you have any > > > suggestions what we could measure going forward that would cough up > some > > > interesting stats to monitor? Pagewatching is useful , but problematic > > > because these are only assigned at page-creation, while some marginal > > > editor interest might be expanded to whole categories (speaking as > > someone > > > who has thousands of pages watchlisted on multiple projects). I like > your > > > thoughts about looking for key articles such as those used as the > > "article > > > as the "main" article for a category or as the title of a navbox ". I > am > > > looking for similar usages of paintings as a way to find popular > painters > > > or paintings rather than just those paintings which have articles > written > > > about them (which are often written for totally random reasons such as > > > theft/sale/wikiproject). > > > > > > On Wed, Apr 26, 2017 at 5:39 AM, Kerry Raymond < > [email protected]> > > > wrote: > > > > > >> Just a few musings on the issue of Importance and how to research it > ... > > >> > > >> I agree it is intuitive that importance is likely to be linked to > > >> pageviews and inbound links but, as the preliminary experiment showed, > > it's > > >> probably not that simple. > > >> > > >> Pageviews tells us something about importance to readers of Wikipedia, > > >> while inbound links tells us something about importance to writers of > > >> Wikipedia, and I suspect that writers are not a proxy for readers as > the > > >> editor surveys suggest that Wikipedia writers are not typical of > broader > > >> society on at least two variables: gender and level of education > (might > > be > > >> others, I can't remember). > > >> > > >> But I think importance is a relative metric rather than absolute. I > > think > > >> by taking the mean value of importance across a number of WikiProjects > > in > > >> the preliminary experiment may have lost something because it tried > > >> (through averaging) to look at importance "generally". I would suspect > > >> conducting an experiment considering only the importance ratings wrt > to > > a > > >> single WikiProject would be more likely to show correlation with > > pageviews > > >> (wrt to other articles in that same WikiProject) and inbound links. > And > > I > > >> think there are two kinds of inbound links to be considered, those > > coming > > >> from other articles within the same WikiProject and those coming from > > >> outside that Wikiproject. I suspect different insights will be > obtained > > by > > >> looking at both types of inbound links separately rather than treating > > them > > >> as an aggregate. I note also that WikiProjects are not entirely > > independent > > >> of one another but have relationships between them. For example, The > > >> WikiProject Australian Roads describes itself as an "intersection" (ha > > ha!) > > >> of WikiProject Highways and WikiProject Australia, so I expect that we > > >> would find greater correlation in importance between related > > WikiProjects > > >> than between unrelated WikiProjects. > > >> > > >> When thinking about readers and pageviews, I think we have to ask > > >> ourselves is there a difference between popularity and importance. Or > > >> whether popularity *is* importance. I sense that, as a group of > educated > > >> people, those of us reading this research mailing list probably do > think > > >> there is a difference. Certainly if there is no difference, then this > > >> research can stop now -- just judge importance by pageviews. Let's > > assume > > >> a difference then. When looking at pageviews of an article, they are > not > > >> always consistent over time. Here are the pageviews for Drottninggatan > > >> > > >> https://tools.wmflabs.org/pageviews/?project=en. > > >> wikipedia.org&platform=all-access&agent=user&range= > > >> latest-90&pages=Drottninggatan > > >> > > >> Why so interesting on 8 April? A terrorist attack occurred there. This > > >> spike in pageviews occurs all the time when some topic is in the news > > (even > > >> peripherally as in this case where it is not the article about the > > >> terrorist attack but about the street in which it occurred). Did the > > street > > >> become more "important"? I think it became more interesting but not > more > > >> important. So I think we do have to be careful to understand that > > pageviews > > >> probably reflect interest rather than importance. I note that The > > >> Chainsmokers (a music group with a number of songs in the current USA > > music > > >> charts) gets many more Wikipedia article pageviews than the Wikipedia > > >> article on Pasteurization but The Chainsmokers are not rated as being > of > > >> high importance by the relevant WikiProjects while Pasteurization is > > very > > >> important in WikiProject Food and Drink. Since pasteurisation > prevents a > > >> lot of deaths, I think we might agree that in the real world > > pasteurisation > > >> is more important than a music group regardless of what pageviews tell > > us. > > >> > > >> https://tools.wmflabs.org/pageviews/?project=en. > > >> wikipedia.org&platform=all-access&agent=user&range= > latest-90&pages=The_ > > >> Chainsmokers|Pasteurization > > >> > > >> Of course it is matters for Wikipedia's success that our *popular* > > >> articles are of high quality, but I think we have be cautious about > > >> pageviews being a proxy for importance. > > >> > > >> When we look at Wikipedia writers' decisions in tagging the importance > > of > > >> articles to WikiProjects, what do we find? As we know, project tags > are > > >> often placed on new articles (and often not subsequently reviewed). So > > >> while I find that quality tags are often out-of-date, the importance > > seems > > >> to be pretty accurate even on a new stub articles. This is because it > is > > >> the importance of the *topic* that is being assessed which is > > independent > > >> of the Wikipedia article itself. Provided the article is clear enough > > about > > >> what it is about and why it matters (which is the traditional content > of > > >> that first paragraph or two and failing to provide it will likely > > result in > > >> speedy deletion of the new article), assessment of the topic's > > importance > > >> can be made even at new stub level. This tells us that importance for > > >> Wikipedia writers is determined by something outside of Wikipedia > > (probably > > >> their real-world knowledge of that topic space -- one assumes that > > project > > >> taggers are quite interested in the topic space of that project). > While > > >> article quality hopefully improves over time, I would be surprised if > > >> article importance greatly changed over time. Obviously there are > > >> counter-examples. I am guessing Donald Trump's article may have grown > > in > > >> importance over time but that's probably because his lede para > changed. > > >> Adding President of the USA into the lede paragraph makes him much > more > > >> important than he was before in the real world and internal to > > Wikipedia he > > >> has acquired an inbound link from the presumably high-importance > > President > > >> of the USA article. So I think it might be interesting to study those > > >> articles whose importance does change over time to see if there are > any > > >> strong correlations with what is happening to the article inside > > Wikipedia. > > >> I think it is this set of importance-changing articles may be where we > > >> really learn what Wikipedia article characteristics are strongly > > correlated > > >> to "importance" given that importance itself appears to be pretty > stable > > >> for most articles. > > >> > > >> Although not stated explicitly, I imagine we believe that generally > less > > >> important articles tend to link to more important articles but more > > >> important articles don't link to less important articles. And hence > > >> in-bound links are likely to matter in assessing importance and that > > >> in-bound links from "important" articles are more valuable than > in-bound > > >> links from less important articles (which creates something of a > > >> bootstrapping problem) similar to the issue to Google's PageRank > > >> algorithms. But I think we do have some information that Google > doesn't > > >> have. The average webpage does not have a lede paragraph that situates > > the > > >> topic relative to other topics; a Wikipedia article does. If I have to > > >> choose to define Thing X in terms of Thing Y, it tends to suggest that > > Y is > > >> more important than X. If Y also defines itself in terms of X, then it > > >> tends to suggest they are equivalent in importance at some way. > Indeed I > > >> suspect when we get to the VERY IMPORTANT topics we will see this kind > > of > > >> circular definition (e.g. you see circular definitions in Wikipedia > > around > > >> Philosophy and Knowledge). Aside, if you have never done this before, > > try > > >> this experiment. Choose a random article (left hand tool bar in > Desktop > > >> Wikipedia), then click the first link in the article that matters > (i.e. > > >> ignore links hatnotes or links inside parentheses). Repeat this first > > link > > >> clicking and sooner or later you will reach articles like Knowledge > and > > >> Philosophy, which all sit inside circular definition groups. > > >> > > >> If we look at the Donald Trump article, his first sentence contains > only > > >> two links, one to List of Presidents of the USA and the other to > > President > > >> of the USA. If we look at the those two articles, we find that both of > > them > > >> mention Donald Trump in their lede paras (although not as early as the > > >> first sentence) and before mentions of any other US President > elsewhere > > in > > >> the article. Which is consistent with what we know about the real > world, > > >> the role of the President is more important than its officeholders and > > that > > >> the current officeholder has more importance than a past officeholder. > > So > > >> topic importance does seems to be skewed towards the "present day". > > >> > > >> So I suspect the links in the lede paras are of greater relevance to > the > > >> assessment of importance than links further down in the article which > > will > > >> be more likely relate to details of a topic and may include examples > and > > >> counter-examples (this is a way in which high importance article may > > >> mention much lower importance articles). However, we do have to be a > > little > > >> bit careful here because of the MoS practice of not linking very > common > > >> terms. For example, an Australian article will often refer to > Australia > > in > > >> the lede para but it will almost certainly not be linked to the > > Australia > > >> article (and any attempt to add such a link will likely see it removed > > with > > >> an edit summary that mentions [[WP:Overlinking]]) whereas there is no > > >> problem if you link to an Australian state article, e.g. New South > > Wales. > > >> So we might find that some very important topics that often appear in > > ledes > > >> might get fewer links that you might expect because of the MoS > policies > > on > > >> overlinking, which may be problem when working with inbound links. It > > may > > >> be that for "very common topics" the presence of the article title (or > > its > > >> synonyms) in the lede may have to be considered as if it were an > > in-bound > > >> link for statistical research purposes. > > >> > > >> Given all of the above, perhaps the most interesting group of articles > > to > > >> study in Wikipedia are those articles whose manually-assessed > importance > > >> has changed over the life of the article AND which were NOT current > > topics > > >> in the lifetime of Wikipedia (given the influence of "current" on > > >> importance). But having said that, I wonder if that group of articles > > >> actually exists. Recently a newish Australian contributor expressed > > >> disappointment that all the new articles they had created were tagged > > (by > > >> others) as of Low Importance. My instinctive reply was "that's > normal, I > > >> think of the thousands of articles I have started only a couple even > > rated > > >> as Mid importance, this is because the really important articles were > > all > > >> started long ago precisely because they were important". I suspect > > topics > > >> that are very important (for reasons other than being short-lived > > >> importance due in being "current" in the lifetime of Wikipedia) will > > >> generally show up as having started early in Wikipedia's life and that > > >> those that become more/less important over time will be largely linked > > to > > >> becoming or ceasing to be "current" topics). E.g. article > Pasteurization > > >> started in May 2001 saying nothing more than " Pasteurization is the > > >> process of killing off bacteria in milk by quickly heating it to a > near > > >> boiling temperature, then quickly cooling it again before the taste > and > > >> other desirable properties are affected. The process was named after > its > > >> inventor, French scientist Louis Pasteur. See also dairy products." > The > > >> links in this very first version are still present in its lede > paragraph > > >> today, suggesting our understanding of "non-current" topics is stable > > and > > >> hence initial importance determinations can probably be accurately > made. > > >> For Pasteurization the Talk page shows it was not project-tagged until > > 2007 > > >> when it was assigned High Importance as its first assessment. > > >> > > >> I suspect we will find that initial manual assessment of article > > >> importance will be pretty accurate for most articles. And I suspect if > > we > > >> plot initial importance assessments against time of assessment, we > will > > >> find the higher importance articles commenced life on Wikipedia > earlier > > >> than the lower importance articles. If I am correct, then there isn't > a > > lot > > >> of value in machine-assessment of importance of topics because it > > relates > > >> to factors external to Wikipedia and often does not change over time > and > > >> therefore can often be correctly assessed manually even on new stub > > >> articles (and any unassessed articles can probably be rated as Low > > >> Importance as statistically that's almost certainly going to be > > correct). > > >> If a topic becomes more important due to "current" events, then > > invariably > > >> that article will be updated by many people and one of them will > sooner > > or > > >> later manually adjust its importance. What is less likely to happen is > > >> re-assessing downwards of Importance when an important "current" topic > > >> loses its importance when it is no longer current, e.g. are former > > American > > >> presidents like Barack Obama or George W Bush or further back less > > >> important now? These articles will not be updated frequently once the > > topic > > >> is no longer in the news and therefore it is less likely an editor > will > > >> notice and manually downgrade the importance, so there may be a > greater > > >> role for machine-assessment in downgrading importance rather than > > upgrading > > >> importance. > > >> > > >> Another area where there might be a role for machine-assessed > importance > > >> in regards to POV-pushing where an POV-motivated editor might change > the > > >> manual-assessment importance of articles to be higher or lower based > on > > >> their POV (e.g. my political party is Top Importance, other parties > are > > of > > >> Low Importance). I suspect that often a page watcher would correct or > at > > >> least question that kind of re-assessment. However, articles with few > > >> active pagewatchers you might get away with POV-pushing the article's > > >> importance tag because nobody noticed. In this situation, a machine > > >> assessment could be useful in spotting this kind of thing. > > >> > > >> This suggests that another metric of interest to importance might be > > >> number of pagewatchers, although I suspect that pagewatching may > relate > > >> more to caring about the article than to caring about the topic. And > one > > >> has to be careful to distinguish active pagewatchers (those who > > actually do > > >> review changes on their watchlists) from those who don't, as that may > > make > > >> a difference (although I am not sure we can really tell which > > pagewatchers > > >> are truly actively reviewing as a "satisfactory review" doesn't leave > a > > >> trace whereas an "unsatisfactory" review is likely to lead to a > > relatively > > >> soon revert or some other change to the article, the article Talk or > the > > >> User Talk of reviewed contributor which may be detectable). > > >> > > >> The other aspect of articles that occurs to me as being possibly > linked > > to > > >> importance of the topic would be use of the article as the "main" > > article > > >> for a category or as the title of a navbox (as it suggests that the > > >> articles in the category or navbox are in some way subordinate to the > > >> main/title article). Similarly for list articles, the "type" of the > > list is > > >> often more important than its instances). > > >> > > >> Kerry > > >> > > >> -----Original Message----- > > >> From: Wiki-research-l [mailto:wiki-research-l- > > [email protected]] > > >> On Behalf Of Morten Wang > > >> Sent: Friday, 21 April 2017 6:04 AM > > >> To: Research into Wikimedia content and communities < > > >> [email protected]> > > >> Subject: Re: [Wiki-research-l] Project exploring automated > > classification > > >> of article importance > > >> > > >> Hi Pine, > > >> > > >> These are great pointers to existing practices on enwiki, some of > which > > >> I've been looking for and/or missed, thanks! > > >> > > >> > > >> Cheers, > > >> Morten > > >> > > >>> On 19 April 2017 at 22:35, Pine W <[email protected]> wrote: > > >>> > > >>> Hi Nettrom, > > >>> > > >>> A few resources from English Wikipedia regarding article importance > as > > >>> ranked by humans: > > >>> > > >>> https://en.wikipedia.org/wiki/Wikipedia:Vital_articles > > >>> > > >>> https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_ > > >>> Editorial_Team/Release_Version_Criteria#Priority_of_topic > > >>> > > >>> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_ > assessment#Statist > > >>> ics > > >>> > > >>> I infer from the ENWP Wikicup's scoring protocol that for purposes of > > >>> the competition, an article's "importance" is loosely inferred from > > >>> the number of language editions of Wikipedia in which the article > > >> appears: > > >>> https://en.wikipedia.org/wiki/Wikipedia:WikiCup/Scoring#Bonus_points > . > > >>> > > >>> HTH, > > >>> > > >>> Pine > > >>> > > >>> > > >>>> On Tue, Apr 18, 2017 at 4:17 PM, Morten Wang <[email protected]> > > wrote: > > >>>> > > >>>> Hello everyone, > > >>>> > > >>>> I am currently working with Aaron Halfaker and Dario Taraborelli at > > >>>> the Wikimedia Foundation on a project exploring automated > > >>>> classification of article importance. Our goal is to characterize > > >>>> the importance of an article within a given context and design a > > >>>> system to predict a relative importance rank. We have a project page > > >>>> on meta[1] and welcome comments > > >>> or > > >>>> thoughts on our talk page. You can of course also respond here on > > >>>> wiki-research-l, or send me an email. > > >>>> > > >>>> Before moving on to model-building I did a fairly thorough > > >>>> literature review, finding a myriad of papers spanning several > > >>>> disciplines. We have > > >>> a > > >>>> draft literature review also up on meta[2], which should give you a > > >>>> reasonable introduction to the topic. Again, comments or thoughts > > (e.g. > > >>>> papers we’ve missed) on the talk page, mailing list, or through > > >>>> email are welcome. > > >>>> > > >>>> Links: > > >>>> > > >>>> 1. https://meta.wikimedia.org/wiki/Research:Automated_ > > >>>> classification_of_article_importance > > >>>> <https://meta.wikimedia.org/wiki/Research:Automated_ > > >>>> classification_of_article_importance> > > >>>> 2. https://meta.wikimedia.org/wiki/Research:Studies_of_Importance > > >>>> > > >>>> Regards, > > >>>> Morten > > >>>> [[User:Nettrom]] aka [[User:SuggestBot]] > > >>>> _______________________________________________ > > >>>> Wiki-research-l mailing list > > >>>> [email protected] > > >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > >>> _______________________________________________ > > >>> Wiki-research-l mailing list > > >>> [email protected] > > >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > >> _______________________________________________ > > >> Wiki-research-l mailing list > > >> [email protected] > > >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > >> > > >> > > >> _______________________________________________ > > >> Wiki-research-l mailing list > > >> [email protected] > > >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > _______________________________________________ > > > Wiki-research-l mailing list > > > [email protected] > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > _______________________________________________ > > Wiki-research-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
