[Wikitech-l] Lsearch and MWSearch: how to turn on morphology for Russian

2014-01-30 Thread Yury Katkov
Hi guys!

I've installed MWSearch and Lucene Search extensions but I can see that the
search engine doesn't understand the morphology of Russian (doesn't
recognize word forms). How can I turn the morphological analyzer on? How
it's done in Russian Wikipedia?

Cheers,
-
Yury Katkov, WikiVote
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Lsearch and MWSearch: how to turn on morphology for Russian

2014-01-30 Thread Nikolas Everett
I hate to say this after all you went through setting up Lucene Search but
it is end of life and not receiving any real support.  We're in the process
of replacing it with the combination of
CirrusSearchhttps://www.mediawiki.org/wiki/Extension:CirrusSearch
/Elasticsearch http://www.elasticsearch.org/ which work pretty much the
same way the MWSearch/Lucene Search combination does.  CirrusSearch has to
be smarter than MWSearch because Elasticsearch doesn't have any Mediawiki
knowledge but because it links into Mediawiki it can do things like expand
templates.  I like it but I'm biased.

That aside, it looks like Lucene Search is supposed to read
InitializeSettings which is kind of wmf specific thing.  You might be able
to trick it into doing it by putting a file called InitializeSettings.php
in the conf directory with the contents

'wgLanguageCode' = array(
 'your $wgDBname' =  'ru',
),


CirrusSearch, if you care to try it, reads the language code from
wgLanguageCode.

Nik



On Thu, Jan 30, 2014 at 3:39 PM, Yury Katkov katkov.ju...@gmail.com wrote:

 Hi guys!

 I've installed MWSearch and Lucene Search extensions but I can see that the
 search engine doesn't understand the morphology of Russian (doesn't
 recognize word forms). How can I turn the morphological analyzer on? How
 it's done in Russian Wikipedia?

 Cheers,
 -
 Yury Katkov, WikiVote
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Lsearch and MWSearch: how to turn on morphology for Russian

2014-01-30 Thread Yury Katkov
Hi! I'll definitely try Cirrus, but still it's interesting to see Lucene
working. Besides everynew extension by WMF typically requires very fresh
MediaWiki version which can be a burden for 3rd parties.

I tried to add InitializeSettings.php, run ./build and ./lsearchd again.
Still no good, when I search the word банк, I expect Lucene to find also
банков, банки, банке, etc., and I can see that these word forms are
presented in a file
LuceneSearch.jar/uzip://org/apache/lucene/analysis/ru/stemsUnicode.txt
and words.Unicode.txt.

Still when I search for банк, I only get банк and the following log:

18409 [pool-2-thread-1] INFO  org.wikimedia.lsearch.search.SearchEngine  -
Using FilterWrapper wrap: {} []
18414 [pool-2-thread-1] INFO  org.wikimedia.lsearch.search.SearchEngine  -
search wikivote: query=[банк] parsed=[custom(+contents:банк^0.2 relevance
([((P contents:банк) (P sections:банк^0.25))^2.0], (P
alttitle:банк~20^2.5) (P related:банк^12.0)) (P alttitle:банк~20))]
hit=[0] in 7ms using IndexSearcherMul:1391088160991
18439 [pool-2-thread-1] INFO  org.wikimedia.lsearch.spell.Suggest  -
wikivote for original=[банк] suggest: [банк] using=[] in 18 ms
24262 [pool-2-thread-2] INFO  org.wikimedia.lsearch.frontend.HttpHandler  -
query:/search/wikivote/%D0%B1%D0%B0%D0%BD%D0%BA?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15%2C90%2C91%2C92%2C93%2C102%2C103%2C106%2C107%2C108%2C109%2C170%2C171offset=0limit=20version=2.1iwlimit=10searchall=1
what:search dbname:wikivote term:банк
24263 [pool-2-thread-2] INFO  org.wikimedia.lsearch.search.SearchEngine  -
Using FilterWrapper wrap: {} []


-
Yury Katkov, WikiVote



On Fri, Jan 31, 2014 at 1:02 AM, Nikolas Everett never...@wikimedia.orgwrote:

 I hate to say this after all you went through setting up Lucene Search but
 it is end of life and not receiving any real support.  We're in the process
 of replacing it with the combination of
 CirrusSearchhttps://www.mediawiki.org/wiki/Extension:CirrusSearch
 /Elasticsearch http://www.elasticsearch.org/ which work pretty much the
 same way the MWSearch/Lucene Search combination does.  CirrusSearch has to
 be smarter than MWSearch because Elasticsearch doesn't have any Mediawiki
 knowledge but because it links into Mediawiki it can do things like expand
 templates.  I like it but I'm biased.

 That aside, it looks like Lucene Search is supposed to read
 InitializeSettings which is kind of wmf specific thing.  You might be able
 to trick it into doing it by putting a file called InitializeSettings.php
 in the conf directory with the contents

 'wgLanguageCode' = array(
  'your $wgDBname' =  'ru',
 ),


 CirrusSearch, if you care to try it, reads the language code from
 wgLanguageCode.

 Nik



 On Thu, Jan 30, 2014 at 3:39 PM, Yury Katkov katkov.ju...@gmail.com
 wrote:

  Hi guys!
 
  I've installed MWSearch and Lucene Search extensions but I can see that
 the
  search engine doesn't understand the morphology of Russian (doesn't
  recognize word forms). How can I turn the morphological analyzer on? How
  it's done in Russian Wikipedia?
 
  Cheers,
  -
  Yury Katkov, WikiVote
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l