https://bugzilla.wikimedia.org/show_bug.cgi?id=45983
Web browser: --- Bug ID: 45983 Summary: Enable creation of dumps dedicated to feeding a search index Product: MediaWiki Version: 1.21-git Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: ContentHandler Assignee: wikidata-b...@lists.wikimedia.org Reporter: daniel.kinz...@wikimedia.de CC: wikidata-b...@lists.wikimedia.org Classification: Unclassified Mobile Platform: --- Some search backends, like LuceneSearch, rely on XML dumps to build the search index. The indexer has no knowledge of content models, so it will index everything in the dump as-is. For non-text content models, this means it will index the serialized form, which will often lead to bad results (see bug 42234). To solve this, a brief discussion on wikitech-l suggests to implement an option for the dump creation process that would output generated text instead of raw serialized data into the dumps. This option could then be used to create dumps especially for rebuilding a search index. See http://www.gossamer-threads.com/lists/wiki/wikitech/340638 The Content interface already defined the function getTextForSearchIndex for generating such pseudo-content. It only needs to be hooked up to dump generation. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l