https://bugzilla.wikimedia.org/show_bug.cgi?id=30906
Web browser: ---
Bug #: 30906
Summary: Store DjVu extracted text in a structured table
instead of img_metadata
Product: MediaWiki
Version: 1.19-svn
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: Images and files
AssignedTo: [email protected]
ReportedBy: [email protected]
CC: [email protected], [email protected]
Blocks: 6421, 30751
Classification: Unclassified
When DjVu files contain text layers, we currently extract these and store them
into the file's metadata blob, so it's available to extensions like
ProofreadPage which can use it.
Unfortunately this *massively* increases the size of the file object -- which
contains the uncompressed serialized metadata blob in memory -- leading to
errors like bug 30751, running out of memory when loading a bunch of file
objects at once in an API request.
In addition it's a bit awkward to access the text from other places; things
like search indexing (bug 6421) would benefit from having a more standardish
place to get at extracted text, and this could also be used for other file
formats.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l