subject:"\"\\\[Wikitech\\\-l\\\] Need a way to modify text before indexing \\\(was SearchUpdate\\\)\""

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2015-10-14 Thread vitalif

FWIW, we do index the full text of (PDF and?) DjVu files on Commons (because it's stored in img_metadata). It's probably the biggest improvement CirrusSearch brought for Commons. And we also index office documents via Tika (*.doc and similar). And I think it should not be a feature of the searc

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2015-10-14 Thread Federico Leva (Nemo)

FWIW, we do index the full text of (PDF and?) DjVu files on Commons (because it's stored in img_metadata). It's probably the biggest improvement CirrusSearch brought for Commons. Nemo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https:/

[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2015-10-14 Thread vitalif

I've written about my problem ~2 years ago: http://wikitech-l.wikimedia.narkive.com/6G0YPmWQ/need-a-way-to-modify-text-before-indexing-was-searchupdate It seems I've lost the latest message, so I want to answer to it now: With lsearchd and Elasticsearch, we absolutely wouldn't want to munge fi

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2014-01-15 Thread Chad

On Wed, Jan 15, 2014 at 12:07 AM, Vitaliy Filippov wrote: > SearchEngine subclasses can implement getTextFromContent() if they want to >> override the normal text fetching behavior. >> > > I can't put it into SearchEngine subclass because Tika isn't a search > engine, it's rather a java applicatio

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2014-01-15 Thread Vitaliy Filippov

SearchEngine subclasses can implement getTextFromContent() if they want to override the normal text fetching behavior. I can't put it into SearchEngine subclass because Tika isn't a search engine, it's rather a java application that runs separately and extracts text from binary files like *

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2014-01-14 Thread Chad

On Tue, Jan 14, 2014 at 2:33 PM, wrote: > Hi! > > Change https://gerrit.wikimedia.org/r/#/c/79025/ that was merged to 1.22 > breaks my TikaMW extension - I used that hook to extract contents from > binary files so the user can then search on it. > > Maybe you can add some other hook for this purp

[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

2014-01-14 Thread vitalif

Hi! Change https://gerrit.wikimedia.org/r/#/c/79025/ that was merged to 1.22 breaks my TikaMW extension - I used that hook to extract contents from binary files so the user can then search on it. Maybe you can add some other hook for this purpose? See also https://github.com/mediawiki4intran

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

Re: [Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

[Wikitech-l] Need a way to modify text before indexing (was SearchUpdate)

7 matches

Site Navigation

Mail list logo

Footer information