Hi,

I've been reading the mw.org and wikitech pages on Cirrussearch (and
the code) in the hope that I will be able to understand how is the
page content transformed before being sent to ES and how is it kept in
ES and I have a few questions:

1. Is the documentation available anywhere? I don't see it on
https://doc.wikimedia.org/

2. What part of the whole ecosystem transforms the wikitext into
indexable text? Where can I find it? It should be somewhere downstream
fromCirrusSearch\Updater::updateFromTitle(), but I can't figure uout
where exactly.

If this transformation doesn't happen, from where is the searchable
text obtained?

3. Where can I find the ES schema used for wikipages? Is it different
for images/categories?

Thanks,
   Strainu

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to