https://bugzilla.wikimedia.org/show_bug.cgi?id=46424
Web browser: ---
Bug ID: 46424
Summary: Urls with curid query indexed by Google
Product: MediaWiki
Version: 1.20.0
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General/Unknown
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
Mobile Platform: ---
This was supposedly fixed (bug 16865; r45360).
And though MediaWiki is indeed outputting "noindex", Google appears to be
ignoring it and as such is indexing duplicate content.
A few examples:
https://www.google.com/search?q=inurl:curid+site:mediawiki.org
1. Discussion - MediaWiki
www.mediawiki.org/?curid=84252
Mar 29, 2012
Hi! Searching for the shortest urls for wikis using scripts other then Latin
was a longtime nightmare. urls using the "wgArticleId" from ...
2. Link to - MediaWiki
www.mediawiki.org/?curid=84277
Mar 30, 2012
mw.config.set({,,,, wgPageName":"Ernst_Lossa","wgTitle":"Ernst Lossa",
"wgCurRevisionId":99548829,"wgArticleId":2809853, ...} ...) nr.
https://www.google.com/search?q=inurl:curid+site:wikipedia.org
3. [edit] Notes - Wikipedia
en.wikipedia.org/wiki/index.html?curid=7490642&action=render
My Brightest Diamond is the project of singer–songwriter and
multi-instrumentalist Shara Worden. The band has released three studio albums,
2006's Bring Me ...
4. Wikipedia, the free encyclopedia
en.wikipedia.org/wiki?curid=
The 1950 Atlantic hurricane season was the first year in the Atlantic
hurricane database (HURDAT) in which storms were given names by the United
States Air ...
5. Wikipedia
simple.wikipedia.org/?curid=
This is the front page of the Simple English Wikipedia. Wikipedias are
places where people work together to write encyclopedias in different
languages. We use ...
6. Table tennis at the 2004 Summer Paralympics - Wikipedia, the free ...
en.wikipedia.org/wiki/index.html?curid=1011065
Table Tennis at the 2004 Summer Paralympics was staged at the Galatsi
Olympic Hall from September 18 to September 27. Competitors were divided into
ten ...
7. Upper Eastside - Wikipedia, the free encyclopedia
en.m.wikipedia.org/wiki/index.html?curid=19698600
A MiMo restaurant on Biscayne Boulevard in the Upper Eastside. The Upper
Eastside is famous for its post war MiMo architecture, and is home to the MiMo
...
8. Robert Loggia - Wikipédia
fr.wikipedia.org/wiki/?curid=899678
Translate this page
Vous pouvez partager vos connaissances en l'améliorant (comment ?) selon les
recommandations des projets correspondants. Robert Loggia est un acteur et ...
What I found is that:
- The ones from mediawiki.org are LiquidThreads pages. LQT apparently overrides
this logic from Article.php and as such is not outputting "robots => index". So
those are a flaw on our end.
- #3 has action=render. That's never supposed to be indexed (separate bug?) but
the way it is used circumvents some of our deferences. #3 accesses an article
by the name of "index.html",, but then overrides the curid and tacks on
action=render. Basically doing:
en.wikipedia.org/wiki/Some_page_name?curid=7490642&action=render
- #4 and #5 have an empty curid
- #6 and #7 are more examples of this odd "index.html" title
- #8 is like the ones on mediawiki.org except that these are not from LQT and
are actually outputting "noindex". This is the main problem.
Though it is somewhat outside the scope of this bug, I think we should:
* Always output rel=canonical when viewing a regular page
(whenever not on a Special page, not a non-View action, no diff or oldid)
So any url, no matter how weirdly constructed, with:
- /?title=
- /w?title=
- /w/index.php?title=
- any of the above with curid instead of title
- any of the above via /wiki/
- any of the above with action=view
Right now we're only doing rel=canonical on redirects which makes no sense to
me.
It is perfectly file to output rel=canonical on the canonical page itself.
* Always output noindex when not rel=canonical but are viewing a page.
Any wikipage/action=view that is not a simple view of the latest version of
an article,
e.g. with diff or oldid
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l