[Bug 6421] Extract embedded text from DjVu and PDF documents for search

bugzilla-daemon Tue, 04 Sep 2012 08:32:13 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=6421


--- Comment #3 from DrTrigon <[email protected]> 2012-09-04 15:32:08 UTC ---
As I can see the bug here is quite old and additionally marked "low" in
prority. Is this bug up to be fixed at all? In my opinion to solve this bug
here is a *must have*.

DrTrigonBot [1] does file content based categorization in commons. Due to this
embedded text from PDF (later DJVU too) is extracted and processed. We are
currently debating [2] about whether to store this text data to a page - in
order to enable the mediawiki search engine to index and find those contents -
or not.

Now the question is: When is this bug scheduled to become fixed? Will it be
fixed at all? IF NOT; As mentioned DrTrigonBot could dump the files text
content to a dedicated page in order to enable the mediawiki search engine to
handle them. This should be considered as a work-a-round only and would not be
needed at all,
if and when this bug here is solved.

[1] http://commons.wikimedia.org/wiki/User:DrTrigonBot
[2]
http://commons.wikimedia.org/wiki/User_talk:DrTrigonBot/JavaScript#PDF_content_extraction

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 6421] Extract embedded text from DjVu and PDF documents for search

Reply via email to