https://bugzilla.wikimedia.org/show_bug.cgi?id=45983
Web browser: ---
Bug ID: 45983
Summary: Enable creation of dumps dedicated to feeding a search
index
Product: MediaWiki
Version: 1.21-git
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: ContentHandler
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
Mobile Platform: ---
Some search backends, like LuceneSearch, rely on XML dumps to build the search
index. The indexer has no knowledge of content models, so it will index
everything in the dump as-is. For non-text content models, this means it will
index the serialized form, which will often lead to bad results (see bug
42234).
To solve this, a brief discussion on wikitech-l suggests to implement an option
for the dump creation process that would output generated text instead of raw
serialized data into the dumps. This option could then be used to create dumps
especially for rebuilding a search index. See
http://www.gossamer-threads.com/lists/wiki/wikitech/340638
The Content interface already defined the function getTextForSearchIndex for
generating such pseudo-content. It only needs to be hooked up to dump
generation.
--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l