You might also consider http://buzzlog.yahoo.com/overall/ which lists the topics the world is searching for.
Bob On 9/30/2011 1:24 PM, Ian Woollard wrote: > The raw dumps are here: > > http://dammit.lt/wikistats/ > > IRC the compressed files consist of the list of the articles that were > accessed, in the order they were retrieved. You have to process them to > count how often each article was read. > > Of course: > > http://stats.grok.se/ > > has done that heavy lifting already and they keep lists of the most popular > articles. > > On 30 September 2011 18:53, Michael Katz<[email protected]> wrote: > >> Thanks for the reply. Can you tell me exactly which dump files you'd look >> in to find the number of page views, plus any information about finding the >> page views within those files, if it's not obvious? Is there a way to >> distinguish between editor page views and user page views? (Perhaps subtract >> the number of edits made? If so, how I can find the number of edits made?) >> >> Something about page views seems a little funny, because it seems like >> there are some very recognizable things that just aren't looked up much. But >> perhaps it's my best hope... >> >> >> >> ________________________________ >> From: WereSpielChequers<[email protected]> >> To: Michael Katz<[email protected]>; English Wikipedia< >> [email protected]> >> Sent: Friday, September 30, 2011 2:55 AM >> Subject: Re: [WikiEN-l] finding the "most recognizable" page names >> >> >> Hi Michael, >> >> I don't know if such a list exists, other than lists by largest numbers of >> views. >> >> Size of article probably relates to interest of one or a few editors and >> complexity of information, I doubt if it would closely relate to >> recognisability. Incoming links is probably better but can get awfully >> skewed by templates, and some links are more meaningful than others. >> >> Recognisable in the USA is not necessarily the same as recognisable >> globally. Ideally if you want a US specific list you need US specific data, >> if you use a global list you could wind up asking Americans about Johnny >> Vegas, Aby Titmuss, Jack Straw and Kevin Pietersen. You might also consider >> the generation you are targeting. Lady_Bird_Johnson would be better known >> among Americans and older people. >> >> I'd suggest using metrics of page views per article, and if you want a >> specifically US product screen out articles that don't use American English >> spelling. Better still would be to get page views from the USA, or at least >> page views ignoring the 6 hours when the US is most likely to be asleep. >> >> WereSpielChequers >> >> >> On 30 September 2011 04:17, Michael Katz<[email protected]> >> wrote: >> >> I'm making a crossword-style word game, and I'm trying to automate the >> process of creating the puzzles, at least somewhat. >>> I am hoping to find or create a list of English Wikipedia page titles, >> sorted roughly by how "recognizable" they are, where by recognizable I mean >> something like, "how likely it is that the average American on the street >> will be familiar with the name/phrase/subject". >>> >>> For instance, just to take a random example, on a recognizability scale >> from 0 to 100, I might score (just guessing here): >>> >>> Lady_Gaga = 90 >>> >>> Lady_Jane_Grey = 10 >>> >>> Lady_and_the_Tramp = 90 >>> >>> Lady_Antebellum = 5 >>> >>> Lady-in-waiting = 70 >>> >>> Lady_Bird_Johnson = 65 >>> >>> Lady_Marmalade = 10 >>> >>> Ladysmith_Black_Mambazo = 10 >>> >>> >>> One suggestion would just be to use the page length (either number of >> characters or physical rendered page length) as a proxy for recognizability. >> That might work, but it feels kind of crude, and certainly would get many >> false positives, such as Bose-Einstein_condensation. >>> Someone suggested to me that I might count incoming page links, and >> referred me to http://dumps.wikimedia.org/enwiki/latest/ and in particular >> the file enwiki-latest-pagelinks.sql.gz. I downloaded and looked at that >> file but couldn't understand whether/how the linking structure was >> represented. >>> So my questions are: >>> >>> (1) Do you know if a list like I'm try to make already exists? >>> >>> (2) If you were going to make a list like this how would you do it? If it >> was based on page length, which files would you download and process to make >> it as efficient as possible? If it was based on incoming links, which files >> specifically would you use, and how would you determine the link count? >>> Thanks for any help. >>> _______________________________________________ >>> WikiEN-l mailing list >>> [email protected] >>> To unsubscribe from this mailing list, visit: >>> https://lists.wikimedia.org/mailman/listinfo/wikien-l >>> >> _______________________________________________ >> WikiEN-l mailing list >> [email protected] >> To unsubscribe from this mailing list, visit: >> https://lists.wikimedia.org/mailman/listinfo/wikien-l >> > > _______________________________________________ WikiEN-l mailing list [email protected] To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
