Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
K. Peachey wrote: Taking advantage of this thread. Take a look at http://es.wikipedia.org/wiki/Special:Search Can those images be moved to a server under our control (WMF, Toolserver...)? Potentially, those systems they could be retrieving the search queries and ips of all visitors via the referer. Originally, some of those images were hosted by WMF-FR but was stopped because it overloaded their server. They should be hosted like a standard image with a FUR and then inculded via a interface message, then the servers should cache them so load shouldn't cause a issue. (although it might be wiser to ask someone better at mediawiki/wikipedia about that) eswiki doesn't allow Fair Use images. It doesn't even allow loacl uploads. I could upload it at commons, but would be deleted in hours. Uploading at enwiki might be an option, but not a good one. English Wikipedia images are not there to serve other projects, and although they could survive more time would finally be deleted. Another option would be the toolserver but: a) I don't have an account there (yet). b) I was afraid of producing too much load for the toolserver. Also, it might need a special provision, per Duesentrieb's email (that may need to be clarified at the rules page). ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
Gregory Maxwell wrote: It makes absolutely no sense for ES wiki to forbid a particular kind of image but then allow it to be inlined unless its some file the WMF couldn't legally host (and also wouldn't forbid as a part of the WP content). One of the main purposes in restricting non-free materials is to keep the content of the WP freely available… from the prospective of a user of Wikipedia a remotely loaded inline image is equivalent to a locally hosted image, except that the remotely loaded image violates their privacy. If ES wouldn't permit the upload for this purpose then they ought not permit the inline injection. That's a reasonable point. It's a matter of discussion for the community, though. I agree that those remotely loaded images are undesirable. But the inline images at the search page are conceptually different from those at articles. The images at the search page belong to the skin. Not to mention that uploading into a wiki where they shouldn't otherwise be allowed, will inevitably lead to someone trying to use them on articles. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
David Gerard wrote: Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole. Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable? - d. Make a filter of google analytics (plus all other known web bugs). When it matches, make an admin look at that user contributions (you can just log it and have the admin review it daily or make a complex mail notification system). Even if you blocked it, the admin can bypass any filter. A sysadmin reviewing the code added cannot be fooled so easily. Taking advantage of this thread. Take a look at http://es.wikipedia.org/wiki/Special:Search Can those images be moved to a server under our control (WMF, Toolserver...)? Potentially, those systems they could be retrieving the search queries and ips of all visitors via the referer. Originally, some of those images were hosted by WMF-FR but was stopped because it overloaded their server. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
David Gerard schrieb: Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole. Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable? Perhaps the solution would be to simply set up our own JS based usage tracker? There are a few options available http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters, the backend could run on the toolserver. Note that anything processing IP addresses will need special approval on the TS. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
I suggest keep the bug on Wikimedia's servers and using a tool which relies on SQL databases. These could be shared with the toolserver where the official version of the analysis tool runs and users are enabled to run their own queries (so taking a tool with a good database structure would be nice). With that the toolserver users could set up their own cool tools on that data. On Thu, Jun 4, 2009 at 4:34 PM, David Gerard dger...@gmail.com wrote: 2009/6/4 Daniel Kinzler dan...@brightbyte.de: David Gerard schrieb: Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole. Are there any technical workarounds feasible? If not blocking the Perhaps the solution would be to simply set up our own JS based usage tracker? There are a few options available http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters, the backend could run on the toolserver. Note that anything processing IP addresses will need special approval on the TS. If putting that on the toolserver passes privacy policy muster, that'd be an excellent solution. Then external site loading can be blocked. (And if the toolservers won't melt in the process.) - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On Thu, Jun 4, 2009 at 10:19 AM, David Gerard dger...@gmail.com wrote: Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole. Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable? Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues. This approach is also compatible with other approaches. It has the advantage of being simple to implement and should produce a considerable reduction in problems regardless of the underlying cause. Just be glad no one has yet turned english wikipedia's readers into their own personal DDOS drone network. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
Michael Rosenthal wrote: I suggest keep the bug on Wikimedia's servers and using a tool which relies on SQL databases. These could be shared with the toolserver where the official version of the analysis tool runs and users are enabled to run their own queries (so taking a tool with a good database structure would be nice). With that the toolserver users could set up their own cool tools on that data. If Javascript was used to serve the bug, it would be quite easy to only load the bug some small fraction of the time, allowing a fair statistical sample of JS-enabled readers (who should, I hope, be fairly representative of the whole population) to be taken without melting down the servers. I suspect the fact that most bots and spiders do not interpret Javascript, and would thus be excluded from participating in the traffic survey, could be regarded as an added bonus. -- Neil ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
2009/6/4 Mike.lifeguard mikelifegu...@fastmail.fm: On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote: Then external site loading can be blocked. Why do we need to block loading from all external sites? If there are specific problematic ones (like google analytics) then why not block those? Because having the data go outside Wikimedia at all is a privacy policy violation, as I understand it (please correct me if I'm wrong). - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On Thu, Jun 4, 2009 at 10:53 AM, David Gerard dger...@gmail.com wrote: I understand the problem with stats before was that the stats server would melt under the load. Leon's old wikistats page sampled 1:1000. The current stats (on dammit.lt and served up nicely on http://stats.grok.se) are every hit, but I understand (Domas?) that it was quite a bit of work to get the firehose of data in such a form as not to melt the receiving server trying to process it. OK, then the problem becomes: how to set up something like stats.grok.se feasibly internally for all the other data gathered from a hit? (Modulo stuff that needs to be blanked per privacy policy.) What exactly are people looking for that isn't available from stats.grok.se that isn't a privacy concern? I had assumed that people kept installing these bugs because they wanted source network break downs per-article and other clear privacy violations. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On Thu, Jun 4, 2009 at 11:01 AM, Mike.lifeguard mikelifegu...@fastmail.fm wrote: On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote: Then external site loading can be blocked. Why do we need to block loading from all external sites? If there are specific problematic ones (like google analytics) then why not block those? Because: (1) External loading results in an uncontrolled leak of private reader and editor information to third parties, in contravention of the privacy policy as well as basic ethical operating principles. (1a) most external loading script usage will also defeat users choice of SSL and leak more information about their browsing to their local network. It may also bypass any wikipedia specific anonymization proxies they are using to keep their reading habits private. (2) External loading produces a runtime dependency on third party sites. Some other site goes down and our users experience some kind of loss of service. (3) The availability of external loading makes Wikimedia a potential source of very significant DDOS attacks, intentional or otherwise. Thats not to say that there aren't reasons to use remote loading, but the potential harms mean that it should probably be a default-deny permit-by-exception process rather than the other way around. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
David Gerard schrieb: 2009/6/4 Gregory Maxwell gmaxw...@gmail.com: Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues. Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful. Possible yes, sensible no. Because if you can edit raw html, you can inject javascript. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
David Gerard schrieb: 2009/6/4 Mike.lifeguard mikelifegu...@fastmail.fm: On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote: Then external site loading can be blocked. Why do we need to block loading from all external sites? If there are specific problematic ones (like google analytics) then why not block those? Because having the data go outside Wikimedia at all is a privacy policy violation, as I understand it (please correct me if I'm wrong). I agree with that, *especially* if it's for the purpose of aggregating data about users. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On Thu, Jun 4, 2009 at 17:00, Gregory Maxwell gmaxw...@gmail.com wrote: On Thu, Jun 4, 2009 at 10:53 AM, David Gerard dger...@gmail.com wrote: I understand the problem with stats before was that the stats server would melt under the load. Leon's old wikistats page sampled 1:1000. The current stats (on dammit.lt and served up nicely on http://stats.grok.se) are every hit, but I understand (Domas?) that it was quite a bit of work to get the firehose of data in such a form as not to melt the receiving server trying to process it. OK, then the problem becomes: how to set up something like stats.grok.se feasibly internally for all the other data gathered from a hit? (Modulo stuff that needs to be blanked per privacy policy.) What exactly are people looking for that isn't available from stats.grok.se that isn't a privacy concern? I had assumed that people kept installing these bugs because they wanted source network break downs per-article and other clear privacy violations. On top of views/page I'd be interested in keywords used, entryexit points, path analysis when people are editing (do they save/leave/try to find help/...) #edit starts, #submitted edits that don't get saved. henna -- Maybe you knew early on that your track went from point A to B, but unlike you I wasn't given a map at birth! Alyssa, Chasing Amy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
Neil Harris wrote: Daniel Kinzler wrote: David Gerard schrieb: 2009/6/4 Gregory Maxwell gmaxw...@gmail.com: Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues. Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful. Possible yes, sensible no. Because if you can edit raw html, you can inject javascript. -- daniel Not if you sanitize the HTML after the fact: just cleaning out script tags and elements from the HTML stream should do the job. After this has been done to the user-generated content, the desired locked-down script code can then be inserted at the final stages of page generation. -- Neil Come to think of it, you could also allow the carefully vetted loading of scripts from a very limited whitelist of Wikimedia-hosted and controlled domains and paths, when performing that sanitization. Inline scripts remain a bad idea: there are just too many ways to obfuscate them and/or inject data into them to have any practical prospect of limiting them to safe features without heroic efforts. However; writing a javascript sanitizer that restricted the user to a safe subset of the language, by first parsing and then resynthesizing the code using formal methods for validation, in a way similar to the current solution for TeX, would be an interesting project! -- Neil ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On 04/06/2009, at 4:08 PM, Daniel Kinzler wrote: David Gerard schrieb: 2009/6/4 Gregory Maxwell gmaxw...@gmail.com: Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues. Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful. Possible yes, sensible no. Because if you can edit raw html, you can inject javascript. When did we start treating our administrators as potentially malicious attackers? Any administrator could, in theory, add a cookie-stealing script to my user JS, steal my account, and grant themselves any rights they please. We trust our administrators. If we don't, we should move the editinterface right further up the chain. -- Andrew Garrett Contract Developer, Wikimedia Foundation agarr...@wikimedia.org http://werdn.us ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
Thanks, that clarifies matters for me. I wasn't aware of #1, though I guess upon reflection that makes sense. -Mike On Thu, 2009-06-04 at 11:07 -0400, Gregory Maxwell wrote: On Thu, Jun 4, 2009 at 11:01 AM, Mike.lifeguard mikelifegu...@fastmail.fm wrote: On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote: Then external site loading can be blocked. Why do we need to block loading from all external sites? If there are specific problematic ones (like google analytics) then why not block those? Because: (1) External loading results in an uncontrolled leak of private reader and editor information to third parties, in contravention of the privacy policy as well as basic ethical operating principles. (1a) most external loading script usage will also defeat users choice of SSL and leak more information about their browsing to their local network. It may also bypass any wikipedia specific anonymization proxies they are using to keep their reading habits private. (2) External loading produces a runtime dependency on third party sites. Some other site goes down and our users experience some kind of loss of service. (3) The availability of external loading makes Wikimedia a potential source of very significant DDOS attacks, intentional or otherwise. Thats not to say that there aren't reasons to use remote loading, but the potential harms mean that it should probably be a default-deny permit-by-exception process rather than the other way around. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On Thu, 2009-06-04 at 17:04 +0100, Andrew Garrett wrote: When did we start treating our administrators as potentially malicious attackers? Any administrator could, in theory, add a cookie-stealing script to my user JS, steal my account, and grant themselves any rights they please. We trust our administrators. If we don't, we should move the editinterface right further up the chain. They are potentially malicious attackers, but we nevertheless trust them not to do bad things. We in this case refers only to most of Wikimedia, I guess, since there has been no shortage of paranoia both on bugzilla and this list recently - a sad state of affairs to be sure. -Mike ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
How does installing 3rd party analytics software help the WMF accomplish its goals? On Thu, Jun 4, 2009 at 8:31 AM, Daniel Kinzler dan...@brightbyte.de wrote: David Gerard schrieb: Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole. Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable? Perhaps the solution would be to simply set up our own JS based usage tracker? There are a few options available http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters, the backend could run on the toolserver. Note that anything processing IP addresses will need special approval on the TS. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
2009/6/4 Finne Boonen hen...@gmail.com: On Thu, Jun 4, 2009 at 17:00, Gregory Maxwell gmaxw...@gmail.com wrote: What exactly are people looking for that isn't available from stats.grok.se that isn't a privacy concern? I had assumed that people kept installing these bugs because they wanted source network break downs per-article and other clear privacy violations. On top of views/page I'd be interested in keywords used, entryexit points, path analysis when people are editing (do they save/leave/try to find help/...) #edit starts, #submitted edits that don't get saved. Path analysis is a big one. All that other stuff, if it won't violate privacy, would be fantastically useful to researchers, internal and external, in ways we won't have even thought of yet, and help us considerably to improve the projects. (This would have to be given considerable thought from a security/hacker mindset - e.g. even with IPs stripped, listing user pages and user page edits would likely give away an identity. Talk pages may do the same. Those are just off the top of my head, I'm sure someone has already made a list of what they could work out even with IPs anonymised or even stripped.) - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
2009/6/4 Andrew Garrett agarr...@wikimedia.org: When did we start treating our administrators as potentially malicious attackers? Any administrator could, in theory, add a cookie-stealing script to my user JS, steal my account, and grant themselves any rights they please. That's why I started this thread talking about things being done right now by well-meaning admins :-) - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On Thu, Jun 4, 2009 at 11:56 AM, Neil Harrisuse...@tonal.clara.co.uk wrote: However; writing a javascript sanitizer that restricted the user to a safe subset of the language, by first parsing and then resynthesizing the code using formal methods for validation, in a way similar to the current solution for TeX, would be an interesting project! Interesting, but probably not very useful. If we restricted JavaScript the way we restricted TeX, we'd have to ban function definitions, loops, conditionals, and most function calls. I suspect you'd have to make it pretty much unusable to make output of specific strings impossible. On Thu, Jun 4, 2009 at 12:45 PM, Gregory Maxwellgmaxw...@gmail.com wrote: Regarding HTML sanitation: Raw HTML alone without JS is enough to violate users privacy: Just add a hidden image tag to a remote site. Yes you could sanitize out various bad things, but then thats not raw HTML anymore, is it? It might be good enough for the purposes at hand, though. What are the use-cases for wanting raw HTML in messages, instead of wikitext or plaintext? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
2009/6/4 Brian brian.min...@colorado.edu: How does installing 3rd party analytics software help the WMF accomplish its goals? Detailed analysis of how users actually use the site would be vastly useful in improving the sites' content and usability. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
On Thu, Jun 4, 2009 at 2:32 PM, David Gerard dger...@gmail.com wrote: 2009/6/4 Gregory Maxwell gmaxw...@gmail.com: I think the biggest problem to reducing accesses is that far more mediawiki messages are uncooked than is needed. Were it not for this I expect this access would have been curtailed somewhat a long time ago. I think you've hit the actual problem there. Someone with too much time on their hands who could go through all of the MediaWiki: space to see what really needs to be HTML rather than wikitext? - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l See bug 212[1], which is (sort of) a tracker for the wikitext-ification of the messages. -Chad [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=212 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
That's why WMF now has a usability lab. On Thu, Jun 4, 2009 at 12:34 PM, David Gerard dger...@gmail.com wrote: 2009/6/4 Brian brian.min...@colorado.edu: How does installing 3rd party analytics software help the WMF accomplish its goals? Detailed analysis of how users actually use the site would be vastly useful in improving the sites' content and usability. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?
2009/6/4 Brian brian.min...@colorado.edu: That's why WMF now has a usability lab. Yep. They'd dive on this stuff with great glee if we can implement it without breaking privacy or melting servers. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l