https://bugzilla.wikimedia.org/show_bug.cgi?id=52647
--- Comment #3 from ErkDemon <[email protected]> --- Hi Jesús! This isn’t a conventional SEO "optimisation" problem, where one has to "do one's part" to help a search engine to find pages – in this case Google already knows all about the links that point to the description pages, it simply refuses point-blank to follow those links and index the contents, presumably on the basis that the apparent file type suffix doesn’t match the content, which is a classic mechanism for delivering malware. Another interpretation is that since obfuscated links can be used for security purposes, perhaps Google shouldn’t be trying to index files that people are trying to conceal. Anyhow, whatever the reason - it seems that no matter how many times you tell Google that your MW description pages exist using sitemaps, links, etc, it won’t look at them. And unfortunately, since the MW designers decided to hide all of a wiki’s high-res images behind the description pages (instead of allowing thumbnail code blocks to link to both the description page AND the source image, by default, your source files won’t show up on Google either. You can override this behaviour for manually-placed images using the LINK= workaround (which means training all your wiki’s contributors to add an extra editing stage every time that they add a link), but that fix isn’t available for category thumbnails. --- Yes, bypassing MediaWiki and creating your own external sitemap or file-list page will tell Google that your high-resolution files exist, but when those images then turn up on an image search, and the user clicks on Google's button to visit the page where the image is used, then there either won’t be a page to go to (if you used the sitemap method, because no crawled page actually links to the image) , or there’ll be a link from Google to your "image list" page, rather than to the pages where the image is actually used. The image has no context unless there’s actually a link on the page where the thumbnails appear, and the easiest way to do this is to add the direct link to the thumbnail code. When we write a script to generate the "image list" page with all the direct image links, we could include links to the info pages that can help human visitors that land on the page via Google, but again, Google itself won’t follow the link. We can also include an auto-generated link to the picture’s, info-page’s "What links here" page to provide the missing context to those humans, but once again, Google won’t read that link to create the network information, this time because the "what links here" page has its robots flags helpfully set to NOFOLLOW NOINDEX. I suppose that one could try to recreate the "crosslink" info by writing your own custom spider, but then you’re practically writing your own image content management system in parallel to MediaWiki, just to get external search to work properly. WIKIPEDIA: So how does Wikipedia manage it? It doesn’t. Wikipedia’s standard info pages are ignored by Google too. Looking at the WP page for "Fish", and image searching for the "Giant Grouper" fish file, Google will report copies of the full-size image (and the info page preview) on multiple external sites, but it won’t give you a link to any of the ones that are lodged on Wikipedia (other than their thumbnails). If we then Google a chunk of text from the standard info page, Google doesn’t seem to have that page in its indexes. http://en.wikipedia.org/wiki/File:Georgia_Aquarium_-_Giant_Grouper_edit.jpg isn’t on Google. What _is_ showing up on Google (for some reason) is the text content for what I’m guessing is the mobile version of the page, at https://en.m.wikipedia.org/wiki/File:Georgia_Aquarium_-_Giant_Grouper_edit.jpg What Wikipedia _have_ done that seems to be successful in increasing their ranking for embedded images is to use the "srcset" attribute to embed direct links to larger versions of the image ( $wgResponsiveImages ), which Google can presumably read and follow. Those are intended to be higher-res versions of the thumbnails for use with retina displays, etc, but it’s a way to at least create a logical link between an image’s use and a version that’s twice as large as the default embedded thumbnail. It’s just a bit of a shame if your thumbnailing engine strips out all the original image’s metadata. And apparently the feature doesn’t yet work for category images. So, currently, mediawiki’s image handling, as far as search engine visibility is concerned, seems to be badly broken. Assuming that getting search engines to read the dedicated info page is a lost cause, we can still easily fix the main problem by giving site admins the option of globally overriding thumbnail behaviour so that there’s some sort of direct link from the thumbnail image block to the full file. It doesn’t have to be an obtrusive link, and it doesn’t have to be the main link that’s used when the thumbnail is clicked, but the information needs to be associated with that thumbnail region somehow, even if it’s only as an additional attribute value. MediaWiki is a great system for storing and making accessible text content, but for most organisations interested in setting up a publishing system for image files, if that system doesn’t make the files and metadata visible to Google, then it’s not yet a working system. If you’re a museum or art gallery hoping to put your image libraries online to increase your organisation’s visibility, then a default Google ranking of zero is not acceptable, and until there’s a better workaround, MW can’t be recommended to those organisations (unless they have a damned good PHP hacker on their staff). I think that this is probably one of the reasons why so many of these organisations are still ignoring MW and using outside services like Flickr – it’s because, for their purposes, the image section of MW isn’t yet compatible with the main tool that people use for finding images online. Needs fixing. Seriously -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
