Sure, info gets lost. And the Long Tail is meaningful for some research no doubt. But my resources are finite.
Actually I do store some all inclusive counts in the compacted 24 hr file: # Lines starting with ampersand (@) show totals per 'namespace' (including omitted counts for low traffic articles) # Since valid namespace string are not known in the compression script any string followed by colon (:) counts as possible namespace string # Please reconcile with real namespace name strings later # 'namespaces' with count < 5 are combined in 'Other' (on larger wikis these are surely false positives) @ aa.z Category 9 @ aa.z File 20 @ aa.z Image 9 @ aa.z MediaWiki 20 @ aa.z NamespaceArticles 163 @ aa.z Special 97 @ aa.z Talk 17 @ aa.z User 35 @ aa.z Wikipedia 16 @ aa.z -other- 11 Erik Zachte > -----Original Message----- > From: [email protected] [mailto:wikitech-l- > [email protected]] On Behalf Of Robert Rohde > Sent: Friday, September 18, 2009 02:33 > To: Wikimedia developers > Cc: Mathias Schindler; Frédéric Schütz; toolserver- > [email protected] > Subject: Re: [Wikitech-l] [Toolserver-l] Archive of visitor stats > > 2009/9/17 Erik Zachte <[email protected]>: > > I think it is extremely important to keep these files for later > analysis by > > historians and others. > > > > Mathias Schindler also keep an archive or at least did till April > (Berlin > > conference). > > He even bought a dedicated external drive for it. > > > > I collect files daily and merge 24 hourly files into one daily file. > > That saves a lot on disk space and makes processing faster. > > Titles with less than 10 requests per day are discarded that also > saves a > > lot. > > Careful, a recent analysis I did suggested that 15% of all page > requests for articles on Wikipedia are for topics requested less than > once per hour. There are a very large number of pages that rarely see > hits, but collectively the traffic to such topics is important. You > could end up biasing certain kinds of analysis if you always exclude > the rarely visited pages. > > -Robert Rohde > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
