https://bugzilla.wikimedia.org/show_bug.cgi?id=52647

       Web browser: ---
            Bug ID: 52647
           Summary: MediaWiki images not being indexed properly by
                    external search engines
           Product: MediaWiki
           Version: 1.19.2
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: major
          Priority: Unprioritized
         Component: General/Unknown
          Assignee: [email protected]
          Reporter: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

===Problem:===
The conventions used by MediaWiki for dealing with uploaded images seem to
result in the uploaded images and their description pages not being indexed by
Google by default.

===Suggested fix:===
Adding an optional configuration switch that can force the default link for
every thumbnail to be the URL of the original file rather than its description
page. 

Thumbnails created using "File:" already include a small additional icon that
always points to the description page, so there would still be description page
links alongside each thumbnail ... however, we'd need to apply these
description-page icons to the auto-generated thumbnails that appear on Category
pages, too.

Not all wiki owners would want this change, so it'd need to be "opt-in".

===Presumed (speculated) reason for failure:===
Normal default behaviour on a website is for a clickable thumbnail image to
link directly to the full version of the file.

MediaWiki breaks this convention in order to have an intermediate page to hold
additional metadata and history information about the image. Unfortunately, the
URL for this additional page ends in an image file identifier (''e.g.''
".jpg"), which means that search engines may have an understandable tendency to
assume that the resource being linked to is an image file rather than HTML/XML. 

It seems that the default behaviour for the search engine is then NOT to
attempt to explore the innards of the "faux .jpg" file but to pass the URL to
its image-indexing routines. These then attempt to load the file that
corresponds to the description page, recognise immediately from the header that
this is ''not'' an image file  - and discard it. 

This can result in a three-way failure: (1) the nice description page with
image preview and copies of all the metadata as text, and with additional
written descriptions, is ignored by the search engine because it appears to be
a malformed (and potentially malicious) image file: (2) the original full-size
image file with embedded metadata is also not indexed because Google never gets
to read a page that links directly to it, and (3) the article thumbnail ''is''
indexed, but is low-quality and low resolution, inherits no embedded metadata,
and might be flagged by the search engine as being associated with a bad (and
potentially broken or malicious) link, so it gets assigned a poor ranking.

In the normal course of affairs, Google will never get to find out that the
original image files exist. Google also can’t read the Wiki’s thumbnail image
listings ( which ''do'' contain direct links to the images), because these are
automatically given a NOINDEX tag, which specifically tells Google not to index
them, and this flag doesn't seem to be overridable.

===Presumed (speculated) reason for Google's behaviour:===

We can argue that this problem is not down to a bug with MediaWiki, and is
instead Google's fault - shouldn't Google analyse pages based on content rather
than on apparent filename suffixes?

However, Google can counter-argue that ignoring apparent filenames would make
their search routines less efficient, that authors should be encouraged to use
appropriate filetype suffixes for their files, and that since
maliciously-constructed JPG files are a known vector for malware, that perhaps
there's even an argument that perhaps Google ''should'' be deliberately
boycotting URLs that suggest that they lead to image files (but don't), on
principle. 

In any case, search engine optimisation is the job of a webpage author not
Google, and if we decide to make our web-pages operate in a way that is
misleading and results in pages not being crawled, then that's our problem
rather than Google's.

===Partial temporary workarounds:===
A wiki’s owner can add direct (Google-followable) links to point to the
original image files themselves, either (1) by manually compiling a separate
listings page with the direct links (which includes images but is missing any
surrounding referential context),  (2) by manually using the LINK= property for
each individual manually-embedded thumbnail (which can involves a lot of extra
work), or (3) by replacing MediaWiki’s "File:" link syntax with a custom
thumbnail template that includes both a link to the image description page, and
a direct link to the original image.

However, the "Link=" override method still doesn't solve the problem of
creating corresponding direct links for the Category-page thumbnails generated
by Mediawiki.  

If a wiki is used partly as a storage system for a lot of large high-quality
images, then its quite possible that many of those images will mainly be be
accessed via category page thumbnails and may not have separate additional
embedded thumbnails that the "Link=" override can be applied to - we still need
some way of telling MediaWiki that we want it to create Google-followable paths
from the category page thumbnails to the full image files. 

===Implementation===
The suggested fix would be to have a switch that makes image thumbnails link
directly to the original files, regardless of whether they were created within
the body of an article using the File: syntax, or were automatically generated
near the end of a "Category" page. 

A secondary link would then be provided either hanging below, on or by the
thumbnail to point to the description page. This secondary link already exists
for thumbnails  generated by "File:", but if the new global override feature
was implemented, a similar "info page" link icon would need to be added to
Category page thumbnails.

===Possible enhanced implementation===
If the bug-fixer wanted to be especially creative, the flag could support
multiple options, for instance, to allow a choice of icon and icon placement –
a wiki owner could then choose to specify, say, that an "INFO" strip icon sits
below every thumbnail, or that a red dot icon or a "page corner curl" icon
floats superimposed on top of the bottom right corner of every thumbnail image
to link to the description page, while the rest of the exposed thumbnail links
to the image itself.

Although the priority for this fix would be to allow a wiki’s administrator to
solve the current problem with Google not indexing full images (without really
changing the look of the pages), an "enhanced" implementation with choice of
infopage icon and position would give MediaWiki additional visual customisation
options.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to