Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On 20 May 2012 22:32, Gwern Branwen gwe...@gmail.com wrote: There's nothing to answer; and I've been copying the most informative or hilarious quotes for posterity, such as an active administrator in good standing wondering if it might actually increase article quality and not constitute vandalism at all! The whole thing was worth it just for that quote; I could not have made up a better example of the sickness. So, your attempt to prove that no-one cares about external links that aren't references showed that ... no-one cares about external links that aren't references. And that editors should regard ELs on the talk page strictly as notes to self. What I'm feeling about this *feels* just like hindsight bias, but I vaguely recall saying something just like that. - d. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Sun, May 20, 2012 at 7:31 PM, Gwern Branwen gwe...@gmail.com wrote: On Sun, May 20, 2012 at 6:09 PM, David Levy lifeisunf...@gmail.com wrote: Yes, there is. Your methodology has been challenged I don't recall any challenges You haven't gone over your methodology. I highly doubt you've selected the links randomly. And you don't seem to have done any analysis of whether or not the links should be there or not. That was my point what percentage of the links were actually good in the first place. Not to try to rationalize results which you hadn't already presented, despite what you think. On Sun, May 20, 2012 at 6:22 PM, Anthony wikim...@inbox.org wrote: Removing 100 random external links? For a few weeks? Then adding back the ones that deserve to be added back? I think it's less questionable to just re-add all the links, no questions asked about 'deserving'. I have no idea which way would be less questionable, nor even what that is supposed to mean. But the right way to do it is to only re-add links which should be added back. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Sun, May 20, 2012 at 7:47 PM, David Levy lifeisunf...@gmail.com wrote: Anthony wrote: Removing 100 random external links? For a few weeks? Then adding back the ones that deserve to be added back? Where and when did Gwern specify a time frame and indicate that the appropriate links would be restored? If this is done, then does it cease to be vandalism? Where did you ask Gwern about this? Okay, I'm imagining it Sounds like something that would improve the encyclopedia. Again, what if hundreds or thousands of users, whose methodologies are undiscussed and potentially flawed, were to take it upon themselves to conduct such experiments without consultation or approval? That's the hypothetical scenario to which I referred. Yes, I know. [rolls eyes] That's unconstructive. I disagree. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Sun, May 20, 2012 at 7:54 PM, Gwern Branwen gwe...@gmail.com wrote: The procedure: remove random links and record whether they are restored to obtain a restoration rate. - To avoid issues with selecting links, I will remove only the final external link on pages selected by http://en.wikipedia.org/wiki/Special:Random#External_links which have at least 2 external links in an 'External links' section, and where the final external link is neither an 'official' link nor template-generated. So, you are not removing random links at all. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Mon, May 21, 2012 at 2:57 AM, David Gerard dger...@gmail.com wrote: On 20 May 2012 22:32, Gwern Branwen gwe...@gmail.com wrote: There's nothing to answer; and I've been copying the most informative or hilarious quotes for posterity, such as an active administrator in good standing wondering if it might actually increase article quality and not constitute vandalism at all! The whole thing was worth it just for that quote; I could not have made up a better example of the sickness. So, your attempt to prove that no-one cares about external links that aren't references showed that ... no-one cares about external links that aren't references. That aren't references, that aren't official, that aren't template-generated, and that aren't the only external link on the page, What I'm feeling about this *feels* just like hindsight bias, but I vaguely recall saying something just like that. Certainly makes sense. What doesn't make much sense is the simultaneous belief that 1) no one cares; and 2) it is vandalism that absolutely *must be stopped* lest Kant roll over in his grave. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Mon, May 21, 2012 at 8:11 AM, Anthony wikim...@inbox.org wrote: On Sun, May 20, 2012 at 7:47 PM, David Levy lifeisunf...@gmail.com wrote: Anthony wrote: Okay, I'm imagining it Sounds like something that would improve the encyclopedia. Again, what if hundreds or thousands of users, whose methodologies are undiscussed and potentially flawed, were to take it upon themselves to conduct such experiments without consultation or approval? That's the hypothetical scenario to which I referred. Yes, I know. Thousands of users all taking in upon themselves to act in in good faith, without discussion and in ways which are potentially flawed, to try to improve an encyclopedia in the way they see best. We should come up with a catchy name for that. Maybe something based on a Hawaiian word. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Mon, May 21, 2012 at 2:57 AM, David Gerard dger...@gmail.com wrote: What I'm feeling about this *feels* just like hindsight bias, but I vaguely recall saying something just like that. It certainly sounds like it too. :) But if you ever refind where you said that, you get some Gwern points. On Mon, May 21, 2012 at 8:07 AM, Anthony wikim...@inbox.org wrote: You haven't gone over your methodology. I highly doubt you've selected the links randomly. And you don't seem to have done any analysis of whether or not the links should be there or not. On Mon, May 21, 2012 at 8:15 AM, Anthony wikim...@inbox.org wrote: So, you are not removing random links at all. . I should just link XKCD here, but I'll forebear. I am reminded of an anecdote describing a court case involving the draft back in Vietnam, where the plaintiff's lawyer argued that the little cage and balls method was not random and was unfair because the balls on top were much more likely to be selected. The judge asked, Unfair to *whom*? Indeed. And I'd note that my methodology, while being quite as random as most methods, carries the usual advantages of determinism: anyone will be able to check whether I did in fact remove only last links which are not official or template-generated in External Link sections, and that I did not simply cherrypick the links that I thought were worst and so least likely to be restored. -- gwern http://www.gwern.net ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
Anthony wrote: Removing 100 random external links? For a few weeks? Then adding back the ones that deserve to be added back? Where and when did Gwern specify a time frame and indicate that the appropriate links would be restored? If this is done, then does it cease to be vandalism? No. Where did you ask Gwern about this? My above question was a sincere response to your mention of specific details, not a rhetorical complaint (though I do believe that it was incumbent upon Gwern to volunteer such information to the community or the WMF for review *before* engaging in mass vandalism). As discussed in this thread, it isn't clear that Gwern's parameters are likely to yield useful information, so this might amount to nothing more than random vandalism. Imagine if hundreds or thousands of editors took it upon themselves to conduct such experiments without consulting the community or the WMF. Removing 100 random external links? For a few weeks? Then adding back the ones that deserve to be added back? Okay, I'm imagining it Sounds like something that would improve the encyclopedia. Again, what if hundreds or thousands of users, whose methodologies are undiscussed and potentially flawed, were to take it upon themselves to conduct such experiments without consultation or approval? That's the hypothetical scenario to which I referred. Yes, I know. And you believe that this would improve the encyclopedia? (Please keep in mind that knowledge of a time frame and commitment to restore the links that deserve to be added back aren't actually included in the scenario; we would know little or nothing about the hypothetical users' plans.) Thousands of users all taking in upon themselves to act in in good faith, without discussion and in ways which are potentially flawed, to try to improve an encyclopedia in the way they see best. We should come up with a catchy name for that. Maybe something based on a Hawaiian word. good faith != prudence way they see best != best way wiki != anarchy An editor, acting in good faith, might believe that inserting original research and edit-warring to keep it in place improves the encyclopedia. That doesn't mean that we're obligated to condone such behavior, let alone without discussion. What doesn't make much sense is the simultaneous belief that 1) no one cares; People obviously care about vandalism. This simply isn't a glaring type, nor does it affect an element of the utmost importance. and 2) it is vandalism that absolutely *must be stopped* lest Kant roll over in his grave. Our default position is to condemn vandalism and seek to counter it. The onus is on Gwern to establish that a special exception should be made. David Levy ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Page Ratings analysis?
Hi Steve, • results of early tests of AFT4 are here: http://www.mediawiki.org/wiki/Article_feedback/Research • Adam Hyland recently posted a series of compelling analyses comparing AFT data with WP1.0 ratings here: https://en.wikipedia.org/wiki/User:Protonk/Article_Feedback • there are recent research papers I've come across using AFT data, but I don't think they are published or publicly available yet (when they are they'll get covered in the research newsletter) • all AFT4 data is publicly available on the toolserver. HTH Dario On May 20, 2012, at 8:52 PM, Steve Bennett wrote: Hi all, Just wondering if there is any published analysis from the Page ratings widget that appears on every page. My subjective impression is that the ratings data is pretty bad, but I'd be interested to read up. Thanks, Steve ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On 5/21/2012 12:33 PM, Carcharoth wrote: one was a link to a find-a-grave page with a photo of the subject (unneeded because we already had a photo of the subject) That is arguable. It depends whether it is the same photo at the same time of life or not. If the only free photo of someone shows them in old age, a link to a site legally hosting a picture of them in their youth would be relevant and should be kept in the external links section as something that readers would likely want to follow. (It also betrays an attitude of: we have one image, we don't need any more, as opposed to curating a visual record of the topic). Actually, the reverse was true: the picture we had was her official photograph from her tenure in congress (1960-1975), and the picture from find-a-grave, which is not dated, is obviously a picture of a substantially older woman. As she lived for another 13 years after retiring from congress, it is likely that the picture was taken during that period. And yes, the photo we are using is PD (as are all Congressional portraits), which is likely why that is the photo used in the article. This leads me on to one of the big gripes I have about Wikipedia and its use of images. Because of the free-content model that Wikipedia is based on, the image use in articles tends to be skewed towards public domain and freely licensed images. For many subjects, this is not a problem, but for some subjects to get a balanced *visual* record of a topic, you need to use (or refer in the text to) non-free images as well, or if fair use is not possible, to link to a site that legally hosts such images. I don't get involved in the image wars. I tend to look for PD images simply because they aren't going to be entangled in those wars, but I don't have the absolutist mentality of only PD images or all of the images possible, copyrights be damned that we see all too often here. The 'ideal' encyclopedia would use these images (and likely have to pay to use them), but Wikipedia seems to think that it is possible to have encyclopedia articles that use free images only, and still maintain NPOV in terms of the images used. I actually think that in some cases the use of only PD or free sources skews the visual presentation, and badly so. What I tend to do in such cases is link to places where the reader can view such images. I can provide some examples if anyone wishes to discuss this. Carcharoth As I noted (in the edit summary, and in my discussion here), the link was of limited utility, as it's simply a black-and-white photo of the subject, with absolutely no information (date, copyright, etc.), and was probably taken after her congressional career ended, after which her profile was substantially lower. I don't see how (in this case, at least) the removal of the link unbalances the article in any way. FWIW, the article in question is [[Julia Butler Hansen]], so you can look at the article and assess whether the removal of the link was damaging. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Mon, May 21, 2012 at 11:39 AM, Gwern Branwen gwe...@gmail.com wrote: On Mon, May 21, 2012 at 8:15 AM, Anthony wikim...@inbox.org wrote: So, you are not removing random links at all. . I should just link XKCD here, but I'll forebear. I am reminded of an anecdote describing a court case involving the draft back in Vietnam, where the plaintiff's lawyer argued that the little cage and balls method was not random and was unfair because the balls on top were much more likely to be selected. The judge asked, Unfair to *whom*? Indeed. --- From the beginning you seem to be under the mistaken impression that I am trying to defend Wikipedia or defend the current Wikipedia processes or something. I am not. I find your experiment interesting. I think it would be more interesting if your selection of links were truly random, though. I don't think you should describe your experiment as removal of 100 random external links by an IP, because your selection was not at all random. I don't say this because I am trying to prove something about the results. I say it because it is a flaw in your methodology. And I'd note that my methodology, while being quite as random as most methods, carries the usual advantages of determinism: anyone will be able to check whether I did in fact remove only last links which are not official or template-generated in External Link sections, and that I did not simply cherrypick the links that I thought were worst and so least likely to be restored. How could we do that? You could have just cherrypicked the worst links that were last links which are not official or template-generated in External Link sections. I'm not saying I think you did that. But you certainly could have. Anyway, the main thing I'd like to say about all of this is simply that your selection is not random. Your sample is biased. Biased in which direction, I don't know. Biased intentionally, I doubt. But your sample is biased. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
Again, what if hundreds or thousands of users, whose methodologies are undiscussed and potentially flawed, were to take it upon themselves to conduct such experiments without consultation or approval? That's the hypothetical scenario to which I referred. Yes, I know. And you believe that this would improve the encyclopedia? (Please keep in mind that knowledge of a time frame and commitment to restore the links that deserve to be added back aren't actually included in the scenario; we would know little or nothing about the hypothetical users' plans.) I believe I answered this above. Trusting people to act in good faith in the way that they feel is in the long-term best interest of creating an encyclopedia is what Wikipedia is all about. Anyway, the world would be drastically different if hundreds or thousands of people were curious enough to conduct such experiments. In my opinion, it would probably be a better place. An editor, acting in good faith, might believe that inserting original research and edit-warring to keep it in place improves the encyclopedia. That doesn't mean that we're obligated to condone such behavior, let alone without discussion. There is a difference between not-condoning the behavior, and calling it vandalism. Do I think Gwern made mistakes in his experiment? Absolutely. I've already said many times that I think his sample was biased. There's also a difference between temporarily removing 100 external links, and edit-warring over the insertion of original research. Gwern wasn't edit-warring at all. What he did was much less disruptive. What doesn't make much sense is the simultaneous belief that 1) no one cares; People obviously care about vandalism. This simply isn't a glaring type, nor does it affect an element of the utmost importance. It isn't vandalism. He wasn't doing it for the purpose of hurting the encyclopedia. and 2) it is vandalism that absolutely *must be stopped* lest Kant roll over in his grave. Our default position is to condemn vandalism and seek to counter it. The onus is on Gwern to establish that a special exception should be made. It isn't vandalism. Assume good faith. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Mon, May 21, 2012 at 5:32 PM, Anthony wikim...@inbox.org wrote: How could we do that? You could have just cherrypicked the worst links that were last links which are not official or template-generated in External Link sections. I'm not saying I think you did that. But you certainly could have. Cherrypicking even under this strategy would force me to do both 2x as much work and engage in conscious deception. If I were consciously trying to deceive, I would have adopted an entirely unverifiable strategy like 'roll a dice' or 'pick a random integer 0-length of links' and then would have both cherry-picked without problem and much less overall effort (as I had to throw out something like a third to half the pages with external links because they did not meet one of the criteria). Anyway, the main thing I'd like to say about all of this is simply that your selection is not random. Your sample is biased. Biased in which direction, I don't know. Biased intentionally, I doubt. But your sample is biased. Sheesh. Every sample is biased in many ways - but random samples are biased in unpredictable ways, which is why randomizing was such a big innovation when Fisher and his contemporaries introduced it. What's next, PRNGs are unacceptable for any kind of study because you can predict each output if you know the seed and run the PRNG appropriately? -- gwern ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
On Mon, May 21, 2012 at 6:02 PM, Gwern Branwen gwe...@gmail.com wrote: On Mon, May 21, 2012 at 5:32 PM, Anthony wikim...@inbox.org wrote: How could we do that? You could have just cherrypicked the worst links that were last links which are not official or template-generated in External Link sections. I'm not saying I think you did that. But you certainly could have. Cherrypicking even under this strategy would force me to do both 2x as much work and engage in conscious deception. Yes. I'm not saying I think you did that. It never crossed my mind that you might have intentionally tried to bias the sample, until you said anyone will be able to check whether I did. We can't check. We simply have to trust you that you picked the links in the way that you claim to have picked the links. In any case, it really doesn't matter, because your sample *was* biased, regardless of your intention. Anyway, the main thing I'd like to say about all of this is simply that your selection is not random. Your sample is biased. Biased in which direction, I don't know. Biased intentionally, I doubt. But your sample is biased. Sheesh. Every sample is biased in many ways - but random samples are biased in unpredictable ways, which is why randomizing was such a big innovation when Fisher and his contemporaries introduced it. What's next, PRNGs are unacceptable for any kind of study because you can predict each output if you know the seed and run the PRNG appropriately? You should read more about sampling bias. Or talk to someone who has. PRNGs are acceptable, though you do have to be careful to avoid publication bias. If you took a list of all external links, and then used a PRNG to pick 100 numbers between 1 and N (the number of links), and then removed those external links, then you would have a random sample. The fact that you can predict each output if you know the seed and run the PRNG appropriately would only come into play if you ran the test several times, with different seeds, and selected one of the runs. By picking articles first, then picking links, you introduce bias. You are biasing your links toward those which are in articles with fewer links. These are probably less likely to be noticed when removed, because articles with lots of links are more likely to be on watchlists, and tend to have more objective criteria. By limiting yourself to links in the External Links section, you introduce bias. These links tend to be the least useful, as they are essentially miscellanea. By limiting yourself to links which are not official, you introduce bias. This one is pretty obvious, I think, and it is one introduction of bias which I think you did intentionally. The removal of official links is quite clearly more likely to be reverted. By limiting yourself to links in articles with more than one external link, and only to links which are not template-generated, you introduce bias. You pretty much admit this, and admit that the bias was intentional (avoids issues where pages might have 5 or 10 'official' external links to various versions or localizations, all of which an editor could confidently and blindly revert the removal of; template-generated links also carry imprimaturs of authority). All of this is fine, by the way, depending on what your intention was to show. If it was to show that a certain type of external link can be removed without likely being reverted, then your methodology is fine. But then you shouldn't advertise your experiment as the removal of 100 random external links, because that is not what you did. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] How the Professor Who Fooled Wikipedia Got Caught by Reddit, _The Atlantic_
Anthony wrote: I believe I answered this above. Trusting people to act in good faith in the way that they feel is in the long-term best interest of creating an encyclopedia is what Wikipedia is all about. I answered *that* by pointing out that we don't indiscriminately permit good-faith editors to do whatever they feel is in the long-term best interest of creating an encyclopedia. When they operate outside the established framework (without consensus that an exception is warranted), we intervene. There is a difference between not-condoning the behavior, and calling it vandalism. _Gwern_ has called it vandalism continually (both in this discussion and on Jimbo's talk page) and even mocked a user for suggesting otherwise. Do I think Gwern made mistakes in his experiment? Absolutely. And those mistakes could have been prevented via consultation with the Wikipedia editing community. There's also a difference between temporarily removing 100 external links, and edit-warring over the insertion of original research. Gwern wasn't edit-warring at all. What he did was much less disruptive. Agreed. I haven't equated the two. It isn't vandalism. Then why does Gwern keep referring to it as such? He wasn't doing it for the purpose of hurting the encyclopedia. Agreed. But vandalism is any addition, removal, or change of content in a deliberate attempt to compromise the integrity of Wikipedia. The experiment is based entirely upon compromising the integrity of Wikipedia and observing editors' reactions (or lack thereof). That Gwern presumably perceives some long-term benefit has no bearing on the immediate effect. Of course, Gwern openly acknowledges that he/she committed blatant vandalism, so you needn't dispute this on his/her behalf. Our default position is to condemn vandalism and seek to counter it. The onus is on Gwern to establish that a special exception should be made. It isn't vandalism. Setting aside the issue of terminology (addressed above), our default position is to condemn the type of edit that Gwern performed and seek to counter it. The onus is on Gwern to establish that a special exception should be made. Assume good faith. At no point have I accused Gwern of acting in bad faith. I merely believe that he/she has behaved inappropriately. David Levy ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l