https://bugzilla.wikimedia.org/show_bug.cgi?id=30906

       Web browser: ---
             Bug #: 30906
           Summary: Store DjVu extracted text in a structured table
                    instead of img_metadata
           Product: MediaWiki
           Version: 1.19-svn
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: Images and files
        AssignedTo: [email protected]
        ReportedBy: [email protected]
                CC: [email protected], [email protected]
            Blocks: 6421, 30751
    Classification: Unclassified


When DjVu files contain text layers, we currently extract these and store them
into the file's metadata blob, so it's available to extensions like
ProofreadPage which can use it.

Unfortunately this *massively* increases the size of the file object -- which
contains the uncompressed serialized metadata blob in memory -- leading to
errors like bug 30751, running out of memory when loading a bunch of file
objects at once in an API request.

In addition it's a bit awkward to access the text from other places; things
like search indexing (bug 6421) would benefit from having a more standardish
place to get at extracted text, and this could also be used for other file
formats.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to