On Tue, May 27, 2014 at 9:37 AM, "Christian Müller" <[email protected]> wrote:
> Hi, > > > a recent discussion in > https://bugzilla.wikimedia.org/show_bug.cgi?id=65724#c3 > > revealed that parts of the SVG standard are deliberately broken on > commons. While I see some reasons to not adhere fully to the standard, > e.g. external resources might break over time, if they are moved or > deleted, I don't feel it's good to break the standard as hard as it's done > right now. It puts a burden on creators, on the principle of sharing > within the wikimedia environment and overall, it's even technically > inferior and leads or might lead to useless duplication of content. > I'm far more concerned about the security/privacy issues than concern about an external resource going away. The checks that you're hitting are likely the security checks we do on the svg. > > The SVG standard defines an image element. The image resource is linked > to using the xlink:href attribute. Optionally the image is embedded into > the SVG using the > https://en.wikipedia.org/wiki/Data_URI_scheme[https://en.wikipedia.org/wiki/Data_URI_scheme] > . > > Combining SVGs with traditional bitmap images is useful in several ways: > It allows creators sharing the way an image is manipulated and eases future > modification that are hard to do or even impossible using traditional > bitmap/photo editing. It basically has the same advantages that mash-up > web content has over static content: Each layer or element can be modified > individually without destroying the other elements. It's easy to see that > a proper SVG is more to its potential users than a classig JPG or PNG with > only one layer, being the result of all image operations. > > These reasons point out the necessity for barrier-free access to the image > element. > > Currently, commons cripples this access layed out in the standard and > originally implemented by "librsvg". It disables the handling of HTTP(S) > resources. Users needing the same bitmap in more than one SVG are forces > to base64-embed their source, and hence duplicate it, in each individual > SVG. Indeed, there is quite some burden on creators and on wikimedia > servers that duplicate lots of data right now and potentially even more in > the future. Note that this duplication of data goes unnoticed by the > routines specifically in place for bitmaps right now, that check uploads on > MD5 collision and reject the upload on dup detection. Space might be cheap > as long as donations are flowing, but reverting bad practice once it is > common is harder than promoting good practice /now/ by adhering to the > standard as closely as possible. > > Therefore I advocate change to librsvg in one of the two ways layed out in > comment 3 of the bug report given above and (re)support linking to external > bitmaps in SVGs. Two strategies that come to mind to prevent disappearance > of an external resource in the web are: > > 1) cache external refs on thumbnail generation, check for updates on > external server on thumbnail re-generation > > 2) allow external refs to images residing on wikimedia servers only > > > Point 2) should be considered the easiest implementation, 1) is harder to > implement but gives even more freedom to SVG creators and would adhere more > closely to SVG standard. However, another argument for 2) would be the > licensing issue: It ensures that only images are linked to that have been > properly licensed by commons users and the upload process (and if a license > violation is detected and the linked-to bitmap removed from commons, the > SVG using such a bitmap breaks gracefully). > Having our servers do arbitrary calls to external resources (option 1) isn't a realistic option from a security perspective. There are some fun poc svg files that abuse this to scan a server's dmz, attack other sites with sql injections, etc. Trusting an image library to correctly speak http without a memory corruption seems a little scary as well, but I'll admit I haven't looked at librsvg's code myself. From a privacy perspective, we also don't want to allow the situation where a reader's device is reaching out to a server that we don't control. So if someone includes a link to the original svg on a webpage, if there are any major browsers that will pull those resources in and let an attacker see the user's IP address, we shouldn't allow that... hmm, and now that I read the bug, I see this is firefox'es behavior in the image you uploaded. We probably want to block that behavior. Allowing a whitelist of WMF domains via https may be possible. In general, the security checking we do on uploaded files is complex enough that I don't like adding another layer of specific checks and exceptions, but if we can find a relatively simple way to do it that maintains our security and privacy requirements, then I wouldn't stand in the way. > > > Regards, > Christian > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
