Sure, info gets lost. And the Long Tail is meaningful for some research no
doubt.
But my resources are finite. 

Actually I do store some all inclusive counts in the compacted 24 hr file:

# Lines starting with ampersand (@) show totals per 'namespace' (including
omitted counts for low traffic articles)
# Since valid namespace string are not known in the compression script any
string followed by colon (:) counts as possible namespace string
# Please reconcile with real namespace name strings later
# 'namespaces' with count < 5 are combined in 'Other' (on larger wikis these
are surely false positives)

@ aa.z Category 9
@ aa.z File 20
@ aa.z Image 9
@ aa.z MediaWiki 20
@ aa.z NamespaceArticles 163
@ aa.z Special 97
@ aa.z Talk 17
@ aa.z User 35
@ aa.z Wikipedia 16
@ aa.z -other- 11

Erik Zachte



> -----Original Message-----
> From: [email protected] [mailto:wikitech-l-
> [email protected]] On Behalf Of Robert Rohde
> Sent: Friday, September 18, 2009 02:33
> To: Wikimedia developers
> Cc: Mathias Schindler; Frédéric Schütz; toolserver-
> [email protected]
> Subject: Re: [Wikitech-l] [Toolserver-l] Archive of visitor stats
> 
> 2009/9/17 Erik Zachte <[email protected]>:
> > I think it is extremely important to keep these files for later
> analysis by
> > historians and others.
> >
> > Mathias Schindler also keep an archive or at least did till April
> (Berlin
> > conference).
> > He even bought a dedicated external drive for it.
> >
> > I collect files daily and merge 24 hourly files into one daily file.
> > That saves a lot on disk space and makes processing faster.
> > Titles with less than 10 requests per day are discarded that also
> saves a
> > lot.
> 
> Careful, a recent analysis I did suggested that 15% of all page
> requests for articles on Wikipedia are for topics requested less than
> once per hour.  There are a very large number of pages that rarely see
> hits, but collectively the traffic to such topics is important.  You
> could end up biasing certain kinds of analysis if you always exclude
> the rarely visited pages.
> 
> -Robert Rohde
> 
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to