Thanks for the response Erik, it's been very informative. I have a few follow 
up questions (inline) 

On 29 octombrie 2015 17:56:25 EET, Erik Bernhardson 
<[email protected]> wrote:
>On Thu, Oct 29, 2015 at 8:47 AM, Strainu <[email protected]> wrote:
>
>> Hi,
>>
>> I've been reading the mw.org and wikitech pages on Cirrussearch (and
>> the code) in the hope that I will be able to understand how is the
>> page content transformed before being sent to ES and how is it kept
>in
>> ES and I have a few questions:
>>
>> 1. Is the documentation available anywhere? I don't see it on
>> https://doc.wikimedia.org/
>>
>>
>Feature documentation is at
>https://www.mediawiki.org/wiki/Help:CirrusSearch,
>operational documentation is at
>https://wikitech.wikimedia.org/wiki/Search

I was referring to the code docs,  they make it easier to follow the class 
hierarchy. 
>
>
>> 2. What part of the whole ecosystem transforms the wikitext into
>> indexable text? Where can I find it? It should be somewhere
>downstream
>> fromCirrusSearch\Updater::updateFromTitle(), but I can't figure uout
>> where exactly.
>>
>>
>The documents are built using the classes in
>https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/tree/master/includes/BuildDocument

I see you use already parsed text. I'm wondering if using the output of 
mwparserfromhell would work - I have some wikitext that is not in a mw database 
that I would like to index. I'm guessing I'll have to write some code,  but the 
idea would be the same. 

>
>
>> If this transformation doesn't happen, from where is the searchable
>> text obtained?
>>
>> 3. Where can I find the ES schema used for wikipages? Is it different
>> for images/categories?
>>
>>
>ES schema is the same everywhere, the easiest way to see what the data
>looks like is just request a dump for a particular page. This will
>output
>json, i use a chrome extension called JsonView to make this look nice:
>https://wikitech.wikimedia.org/wiki/Search?action=cirrusdump

That is very cool indeed. 

Thanks again, 
 Strainu
>
>
>> Thanks,
>>    Strainu
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>_______________________________________________
>Wikitech-l mailing list
>[email protected]
>https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to