Erik Zachte wrote:
> Sure, info gets lost. And the Long Tail is meaningful for some research no
> doubt.
> But my resources are finite. 
> 
> Actually I do store some all inclusive counts in the compacted 24 hr file:
> 
> # Lines starting with ampersand (@) show totals per 'namespace' (including
> omitted counts for low traffic articles)
> # Since valid namespace string are not known in the compression script any
> string followed by colon (:) counts as possible namespace string
> # Please reconcile with real namespace name strings later
> # 'namespaces' with count < 5 are combined in 'Other' (on larger wikis these
> are surely false positives)

Making the script aware of namespace names would be quite easy.


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to