https://bugzilla.wikimedia.org/show_bug.cgi?id=69371

--- Comment #4 from Yuri Astrakhan <[email protected]> ---
Sure, we could ignore it, but i am worried if this could point to some bigger
issue.  Parsing all zero.tsv* files i noticed a large number of other strange
items - highly broken URLs that still return miss/200 result, some of which are
images.  Please take a look at my home dir (i think its public) at

  yurik@stat1002:~/zero-sms/scripts$ grep '/200' *.txt

Various IP addresses and random hosts (e.g. 0.facebook.com) keep appearing, and
get resolved just fine by the backend, even though they clearly shouldn't.

My code does regex substitute on all URLs now:  ^(https?://.+)\1   ->   \1
But not counting those, there are 35000 other host parsing errors in the logs,
with some big spikes (will attach graph of them).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to