On Sat, Oct 1, 2011 at 9:58 PM, Ian Woollard <[email protected]> wrote: > On 1 October 2011 18:15, Carcharoth <[email protected]> wrote: > >> The assumption "Presumably anything that still remains is of >> sufficient quality for whatever level the article is" has so much >> wrong with it that I don't know where to start. > > > No, if material lasts for a long period in an article it's highly likely to > be fairly good even if it gets rewritten later; and the more material and > the longer it lasts, the better.
Material lasts a long time for two reasons: (a) It is good and lots of people have checked it and left it alone; (b) It is bad/wrong and no-one has spotted it yet and replaced it or rewritten it. I don't see how you can devise a metric to distinguish these two case, as you would have to detect the number of people silently checking and approving something (not just reading it). Lots of quality control is *silent* and not detectable in the current metrics. It would be different if there were a way for people to mark text and say "I have this book and have checked this citation, or followed the URL and agree with what is written here". Essentially a way to detect the silent verification that often takes place. > It's the area under the curve that matters, not whether it *eventually* gets > rewritten. > > So time_in_article * number_of_unique_characters is probably a fairly good > metric. Not in the case of obscure articles written by one person, not linked much from anywhere (but not triggering orphan article bots), and only small changes made over the years. View stats might help here, but probably not much as there are a vast, vast number of articles not visited very much at all. Those would account for most of the "unchanged text" you would be picking up. > And you could multiply by the article hit rate to get an even better metric > I expect. > > Whereas you can get very high edit counts by many well-known ways, even > breaking an edit down into many sub-edits can multiply up edit counts, or > just doing lots of vandalism reverts. Yes, I never said edit count was reliable for anything or useful in any way. I'm only saying that unique text is likely not very helpful either. But the best way to find out is to actually try this and see if it shows anything useful. If it does, great. If not, then try again. <snip> Carcharoth _______________________________________________ WikiEN-l mailing list [email protected] To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
