https://bugzilla.wikimedia.org/show_bug.cgi?id=67157

            Bug ID: 67157
           Summary: CirrusSearch: Failing to reindexMeta
           Product: MediaWiki extensions
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: CirrusSearch
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected],
                    [email protected]
       Web browser: ---
   Mobile Platform: ---

We're having trouble reindexing meta because we're hitting a page with an
external link that contains invalid utf-8:

[2014-06-26 18:43:29,960][DEBUG][action.bulk              ] [elastic1018]
[metawiki_general_1403807864][5] failed to execute bulk item (index) index
{[metawiki_general_1403807864][page][661035],
source[{"namespace":2,"namespace_text":"User","title":"COIBot/Local/selftrans.narod.ru","timestamp":"2011-10-11T04:19:40Z","category":["Pages
where template include size is exceeded","Noindexed pages","COIBot Local
Reports"],"external_link":["//wikipediatools.appspot.com/linksearch.jsp?set=top20&link=selftrans.narod.ru","//wikipediatools.appspot.com/linksearch.jsp?set=top40&link=selftrans.narod.ru","//wikipediatools.appspot.com/linksearch.jsp?set=major&link=selftrans.narod.ru","http://www.google.com/search?num=10&hl=en&rls=en&q=selftrans.narod.ru","//www.google.com/search?num=100?h1=en&rls=en&q=selftrans.narod.ru+site:en.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=selftrans.narod.ru+site:fr.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=selftrans.narod.ru+site:de.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=selftrans.narod.ru+site:meta.wikimedia.org","http://siteexplorer.search.yahoo.com/advsearch?p=selftrans.narod.ru&bwm=i&bwmf=d&bwms=p","//toolserver.org/~erwin85/xwiki.php?report=User:COIBot/LinkReports/selftrans.narod.ru&forcelive=1","//toolserver.org/~erwin85/xwiki.php?report=User:COIBot/Local/selftrans.narod.ru&forcelive=1","//tools.wmflabs.org/searchsbl/?url=selftrans.narod.ru","http://whois.domaintools.com/selftrans.narod.ru","http://www.aboutus.org/selftrans.narod.ru","http://www.malwaredomainlist.com/mdl.php?search=selftrans.narod.ru&colsearch=Domain&quantity=50","http://www.alexa.com/data/details/main?url=selftrans.narod.ru","http://213.180.199.13","//wikipediatools.appspot.com/linksearch.jsp?set=top20&link=213.180.199.13","//wikipediatools.appspot.com/linksearch.jsp?set=top40&link=213.180.199.13","//wikipediatools.appspot.com/linksearch.jsp?set=major&link=213.180.199.13","http://www.google.com/search?num=10&hl=en&rls=en&q=213.180.199.13","//www.google.com/search?num=100?h1=en&rls=en&q=213.180.199.13+site:en.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=213.180.199.13+site:fr.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=213.180.199.13+site:de.wikipedia.org","//www.google.com/search?num=100&hl=en&rls=en&q=213.180.199.13+site:meta.wikimedia.org","http://siteexplorer.search.yahoo.com/advsearch?p=213.180.199.13&bwm=i&bwmf=d&bwms=p","//tools.wmflabs.org/searchsbl/?url=213.180.199.13","http://whois.domaintools.com/213.180.199.13","http://www.aboutus.org/213.180.199.13","http://www.malwaredomainlist.com/mdl.php?search=213.180.199.13&colsearch=Domain&quantity=50","http://www.alexa.com/data/details/main?url=213.180.199.13","http://uk.wikipedia.org/wiki/Mediawiki:Spam-whitelist","http://www.google.com/search?q=%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%82%C3%83%C2%83%C3%82%C2%82%C3%83%C2%82%C3%82%C2%83%C3%83%C2%83%C3%82%C2%83%C3

...

java.lang.IllegalArgumentException: Document contains at least one immense term
in field="external_link" (whose UTF8 encoding is longer than the max length
32766), all of which were skipped.  Please correct the analyzer to not produce
such terms.  The prefix of the first immense term is: '[68 74 74 70 3a 2f 2f 77
77 77 2e 67 6f 6f 67 6c 65 2e 63 6f 6d 2f 73 65 61 72 63 68 3f 71]...'


I'm not sure if this is a new feature of 1.2.1 or what.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to