https://bugzilla.wikimedia.org/show_bug.cgi?id=52647

--- Comment #3 from ErkDemon <[email protected]> ---
Hi Jesús!
This isn’t a conventional SEO "optimisation" problem, where one has to "do
one's part" to help a search engine to find pages – in this case Google already
knows all about the links that point to the description pages, it simply
refuses point-blank to follow those links and index the contents, presumably on
the basis that the apparent file type suffix doesn’t match the content, which
is a classic mechanism for delivering malware.

Another interpretation is that since obfuscated links can be used for security
purposes, perhaps Google shouldn’t be trying to index files that people are
trying to conceal.

Anyhow, whatever the reason - it seems that no matter how many times you tell
Google that your MW description pages exist using sitemaps, links, etc, it
won’t look at them.

And unfortunately, since the MW designers decided to hide all of a wiki’s
high-res images behind the description pages (instead of allowing thumbnail
code blocks to link to  both the description page AND the source image, by
default, your source files won’t show up on Google either.

You can override this behaviour for manually-placed images using the LINK=
workaround (which means training all your wiki’s contributors to add an extra
editing stage every time that they add a link), but that fix isn’t available
for category thumbnails. 
---

Yes, bypassing MediaWiki and creating your own external sitemap or file-list
page will tell Google that your high-resolution files exist, but when those
images then turn up on an image search, and the user clicks on Google's button
to visit the page where the image is used, then there either won’t be a page to
go to (if you used the sitemap method, because no crawled page actually links
to the image) , or there’ll be a link from Google to your "image list" page,
rather than to the pages where the image is actually used. The image has no
context unless there’s actually a link on the page where the thumbnails appear,
and the easiest way to do this is to add the direct link to the thumbnail code.

When we write a script to generate the "image list" page with all the direct
image links, we could include links to the info pages that can help human
visitors that land on the page via Google, but again, Google itself won’t
follow the link. We can also include an auto-generated link to the picture’s,
info-page’s "What links here" page to provide the missing context to those
humans, but once again, Google won’t read that link to create the network
information, this time because the "what links here"  page has its robots flags
helpfully set to NOFOLLOW NOINDEX.

I suppose that one could try to recreate the "crosslink" info by writing your
own custom spider, but then you’re practically writing your own image content
management system in parallel to MediaWiki, just to get external search to work
properly.

WIKIPEDIA:
So how does Wikipedia manage it?
It doesn’t. Wikipedia’s standard info pages are ignored by Google too. Looking
at the WP page for "Fish", and image searching for the "Giant Grouper" fish
file, Google will report copies of the full-size image (and the info page
preview) on multiple external sites, but it won’t give you a link to any of the
ones that are lodged on Wikipedia (other than their thumbnails). If we then
Google a chunk of text from the standard info page, Google doesn’t seem to have
that page in its indexes.
http://en.wikipedia.org/wiki/File:Georgia_Aquarium_-_Giant_Grouper_edit.jpg
isn’t on Google. What _is_ showing up on Google (for some reason) is the text
content for what I’m guessing is the mobile version of the page, at
https://en.m.wikipedia.org/wiki/File:Georgia_Aquarium_-_Giant_Grouper_edit.jpg  

What Wikipedia _have_ done that seems to be successful in increasing their
ranking for embedded images is to use the "srcset" attribute to embed direct
links to larger versions of the image ( $wgResponsiveImages ), which Google can
presumably read and follow. Those are intended to be higher-res versions of the
thumbnails for use with retina displays, etc, but it’s a way to at least create
a logical link between an image’s use and a version that’s twice as large as
the default embedded thumbnail. It’s just a bit of a shame if your thumbnailing
engine strips out all the original image’s metadata. And apparently the feature
doesn’t yet work for category images.  

So, currently, mediawiki’s image handling, as far as search engine visibility
is concerned, seems to be badly broken. Assuming that getting search engines to
read the dedicated info page is a lost cause, we can still easily fix the main
problem by giving site admins the option of globally overriding thumbnail
behaviour so that there’s some sort of direct link from the thumbnail image
block to the full file. It doesn’t have to be an obtrusive link, and it doesn’t
have to be the main link that’s used when the thumbnail is clicked, but the
information needs to be associated with that thumbnail region somehow, even if
it’s only as an additional attribute value. 

MediaWiki is a great system for storing and making accessible text content, but
for most organisations interested in setting up a publishing system for image
files, if that system doesn’t make the files and metadata visible to Google,
then it’s not yet a working system. If you’re a museum or art gallery hoping to
put your image libraries online to increase your organisation’s visibility,
then a default Google ranking of zero is not acceptable, and until there’s a
better workaround, MW can’t be recommended to those organisations (unless they
have a damned good PHP hacker on their staff). I think that this is probably
one of the reasons why so many of these organisations are still ignoring MW and
using outside services like Flickr – it’s because, for their purposes, the
image section of MW isn’t yet compatible with the main tool that people use for
finding images online.

Needs fixing. Seriously

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to