[Bug 69371] zero.log contains duplicate host in logs

2014-08-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #15 from nuria nu...@wikimedia.org --- You can reproduce one of this requests by using telnet as follows. Please note host and url below [prompt]$ telnet 91.198.174.204 80 Trying 91.198.174.204... Connected to

[Bug 69371] zero.log contains duplicate host in logs

2014-08-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #9 from nuria nu...@wikimedia.org --- Summing up, somehow we are generating requests in the client like: http://es.m.wikipedia.org/http://es.m.wikipedia.org/wiki/Wikipedia:Portada; which, according to http are valid requests, those

[Bug 69371] zero.log contains duplicate host in logs

2014-08-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 nuria nu...@wikimedia.org changed: What|Removed |Added Status|NEW |RESOLVED

[Bug 69371] zero.log contains duplicate host in logs

2014-08-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #10 from nuria nu...@wikimedia.org --- Need to look at IP ranges as I looked at languages and wikipedia's for geographic commonality and that might not be the best. -- You are receiving this mail because: You are the assignee for

[Bug 69371] zero.log contains duplicate host in logs

2014-08-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #11 from nuria nu...@wikimedia.org --- I take my prior comment back, this looks like a proxy issue not client issue. Data below for requests that match orghttp in the month of August thus far in zero, mobile and sampled. 1. zero

[Bug 69371] zero.log contains duplicate host in logs

2014-08-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #12 from nuria nu...@wikimedia.org --- IPs with most issues in zero are not the most used IPs so, again, this points to a proxy issue. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC

[Bug 69371] zero.log contains duplicate host in logs

2014-08-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #13 from Yuri Astrakhan yu...@wikimedia.org --- Nuria, are you saying one of our proxies is causing this? Or is it some common proxy software that many carriers are using that sets incorrect HOST value when forwarding request? --

[Bug 69371] zero.log contains duplicate host in logs

2014-08-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #14 from nuria nu...@wikimedia.org --- Well, neither. By looking at the data looks to be caused by a proxy but I do not think is common software as the percentage data affected seems pretty small. -- You are receiving this mail

[Bug 69371] zero.log contains duplicate host in logs

2014-08-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #6 from nuria nu...@wikimedia.org --- Parsing all zero.tsv* files i noticed a large number of other strange items - highly broken URLs that still return miss/200 result. I think we are missing issues here. I will address this

[Bug 69371] zero.log contains duplicate host in logs

2014-08-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #7 from nuria nu...@wikimedia.org --- I think we are missing issues here - I think we are MIXing issues -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug.

[Bug 69371] zero.log contains duplicate host in logs

2014-08-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #8 from nuria nu...@wikimedia.org --- Please take a look at our logging format on varninsh: https://git.wikimedia.org/blob/operations%2Fpuppet/production/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default#L9 The interesting part:

[Bug 69371] zero.log contains duplicate host in logs

2014-08-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 nuria nu...@wikimedia.org changed: What|Removed |Added CC||nu...@wikimedia.org ---

[Bug 69371] zero.log contains duplicate host in logs

2014-08-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #2 from nuria nu...@wikimedia.org --- This problem is also present on sampled logs, mobile and api since records are kept on stats1002. That means since about May 2013. I imagine there must have been another bug fixed on this regard

[Bug 69371] zero.log contains duplicate host in logs

2014-08-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #3 from Kevin Leduc kle...@wikimedia.org --- If you compare the number of occurrences to the number of lines in the file, this seems to happen less than 0.025% of the time. This seems insignificant. Could this just be corrupt data

[Bug 69371] zero.log contains duplicate host in logs

2014-08-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #4 from Yuri Astrakhan yu...@wikimedia.org --- Sure, we could ignore it, but i am worried if this could point to some bigger issue. Parsing all zero.tsv* files i noticed a large number of other strange items - highly broken URLs

[Bug 69371] zero.log contains duplicate host in logs

2014-08-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=69371 --- Comment #5 from Yuri Astrakhan yu...@wikimedia.org --- Created attachment 16166 -- https://bugzilla.wikimedia.org/attachment.cgi?id=16166action=edit Graph of the bad host counts per day as detected in zero logs -- You are receiving this