[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-08-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 metatron metat...@online.ms changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #14 from Yuvi Panda yuvipa...@gmail.com --- @metratron: Help would be appreciated! I've copied scrubbed-of-IPs sample log (with 1000 entries) to /shared/sample-nginx-log/cleaned-samplelog.log. If you can write a script (Python

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #15 from Yuvi Panda yuvipa...@gmail.com --- Hopefully the 1000 log entries are enough. I can provide a larger sample if needed. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #16 from Yuvi Panda yuvipa...@gmail.com --- And I guess the format would be: 1. toolname 2. url 3. hits 4. bytes I wonder if we should actually augment this with other stats, such as: 5. error responses (non-200) 6. UAs. Perhaps

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #5 from metatron metat...@online.ms --- Any progress on this thing? As already mentioned, both nginx-proxies (domainproxy urlsproxy) went live. Thus it should be knickknack to run some sed to sanitize the logs - and make them

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #6 from Yuvi Panda yuvipa...@gmail.com --- I can make redacted logs available in a familiar pattern, with the following stripped out: 1. IP Address 2. Referrer fields The only problem is that currently the proxy's logs are rotated

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #7 from metatron metat...@online.ms --- Great! (UA referer would be fine though, as they are already present in tools logs). Concerning archive - maybe one could steal some ideas for this from prod.-varnishes ;-) -- You are

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #8 from Yuvi Panda yuvipa...@gmail.com --- Hmm, I don't see any non WMF Referrers in the access.log (looked at heritage's logs). Can someone verify / confirm? -- You are receiving this mail because: You are the assignee for the

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #9 from Yuvi Panda yuvipa...@gmail.com --- After conversations with Coren: Lighty's default format doesn't record referrers, but there's no reason for that. So I'll just strip out IPs. -- You are receiving this mail because: You

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #10 from Yuvi Panda yuvipa...@gmail.com --- So, current plan would be to: 1. Have lograte set to rotate logs daily 2. Setup a post-processing script that runs after the rotation has happened, and strip IPs (more probably, just

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #11 from metatron metat...@online.ms --- Would it be possible to logrotate/process them on an hourly basis? Like: https://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-06/ Just to be compatible and to allow a more fine-grained

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #12 from metatron metat...@online.ms --- (In reply to metatron from comment #11) Would it be possible to logrotate/process them on an hourly basis? Like: https://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-06/ Just to be

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-06-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #13 from metatron metat...@online.ms --- If you need some helping hands, provide me some 100k raw logs and I'll write a bash-script with awk to summarize format the logs exactly like pageview dumps. -- You are receiving this mail

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-05-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #4 from metatron metat...@online.ms --- Now that new YuviProxy is in place, I just need access to logdumps (IP's stripped off). sed awk will do the rest of the job. -- You are receiving this mail because: You are the assignee for

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-01-08 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #3 from Ariel T. Glenn ar...@wikimedia.org --- Heh, I don't manage it, I just know where stuff that lands on dumps.wikimedia.org comes from. Just for the sake of clarification, we have logs written already that get saved someplace?

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-01-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #1 from Marc A. Pelletier m...@uberbox.org --- That should be relatively simple to do. I do not, however, have the bandwidth to write this myself at this time. The logs are currently in Apache common format; if someone provides a

[Bug 59222] Request to access redacted webproxy logfiles of (Tool) Labs

2014-01-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=59222 --- Comment #2 from metatron metat...@online.ms --- (18:38:31) hedonil: YuviPanda: Coren: AFAIK apergos is the one who manages this log stuff in operations. maybe one could borrow some lines of his script, so that logfiles are summarized per