https://bugzilla.wikimedia.org/show_bug.cgi?id=21061

           Summary: Add uploaded file text and metadata from files to
                    fulltext search set
           Product: MediaWiki
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: Search
        AssignedTo: [email protected]
        ReportedBy: [email protected]
            Blocks: 6421,6422,13370


We're starting to integrate text extraction for djvu and pdf files -- currently
used for ProofreadPage extension -- but it's not currently exposed to the
search indexing.

This is also something frequently desired for text document types like .doc and
.odf, and there are some extensions for doing that but there's not a clean
interface to plug it in to that can be supported for all search backends.

Note that supporting the Lucene search which updates separately might require
some additional attention.

Related bugs:
* bug 6421 - search djvu file text
* bug 6422 - search pdf file text
* bug 13370 - search file metadata

Also interesting idea:
* bug 18045 - search text of linked files (but if these are remote, that's much
harder to handle!)

Things we need:
* clear interface on File for things that need to be fetched (exif metadata,
page text)
* clear interface on the SearchEngine class for plugging additional info in to
updates
* a way to expose additional searchable info to the Lucene search's updaters
(plugin to oai interface maybe to toss in extra data fields?)


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to