https://bugzilla.wikimedia.org/show_bug.cgi?id=69371

            Bug ID: 69371
           Summary: zero.log contains duplicate host in logs
           Product: Analytics
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General/Unknown
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected],
                    [email protected], [email protected],
                    [email protected], [email protected]
       Web browser: ---
   Mobile Platform: ---

yurik@stat1002:/a/squid/archive/zero$ zgrep orghttp zero.tsv.log-20140810.gz

produces a significant number of broken entries, whose URL is duplicated:

http://en.m.wikipedia.orghttp://en.m.wikipedia.org/
http://en.m.wikipedia.orghttp://en.m.wikipedia.org/favicon.ico
etc

The first entry is just the host with protocol, the second is identical
protocol+host with the rest of the URL.

Stats in the last few days (I also ran it for the august of last year - similar
numbers, so this is not a recent bug)

yurik@stat1002:/a/squid/archive/zero$ zgrep -c  orghttp zero.tsv.log-201408*
zero.tsv.log-20140801.gz:183
zero.tsv.log-20140802.gz:215
zero.tsv.log-20140803.gz:167
zero.tsv.log-20140804.gz:170
zero.tsv.log-20140805.gz:144
zero.tsv.log-20140806.gz:235
zero.tsv.log-20140807.gz:272
zero.tsv.log-20140808.gz:259
zero.tsv.log-20140809.gz:169
zero.tsv.log-20140810.gz:173


Host distribution:
$ sort tmp | uniq --count
      1 cp1046.eqiad.wmnet
      4 cp1047.eqiad.wmnet
      8 cp1059.eqiad.wmnet
      6 cp1060.eqiad.wmnet
     51 cp3011.esams.wikimedia.org
     58 cp3012.esams.wikimedia.org
     62 cp3013.esams.wmnet
     57 cp3014.esams.wmnet
      4 cp4011.ulsfo.wmnet
      1 cp4012.ulsfo.wmnet
      4 cp4019.ulsfo.wmnet
      3 cp4020.ulsfo.wmnet

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to