Erik Zachte wrote: > Sure, info gets lost. And the Long Tail is meaningful for some research no > doubt. > But my resources are finite. > > Actually I do store some all inclusive counts in the compacted 24 hr file: > > # Lines starting with ampersand (@) show totals per 'namespace' (including > omitted counts for low traffic articles) > # Since valid namespace string are not known in the compression script any > string followed by colon (:) counts as possible namespace string > # Please reconcile with real namespace name strings later > # 'namespaces' with count < 5 are combined in 'Other' (on larger wikis these > are surely false positives)
Making the script aware of namespace names would be quite easy. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
