Re: Loading huge synonym list in Solr

2011-08-05 Thread Arun Atreya
Thanks a ton, Robert.

I checked out the latest nightly and changed the following in my
solrconfig.xml:

luceneMatchVersionLUCENE_33/luceneMatchVersion

to

luceneMatchVersionLUCENE_40/luceneMatchVersion

The new SynonymFilter loaded all the 1.9 million lines of synonyms in less
than 5 minutes! Awesome!

Thanks to all who developed this huper duper fast synonym filter!



On Thu, Aug 4, 2011 at 5:01 PM, Robert Muir rcm...@gmail.com wrote:

 https://issues.apache.org/jira/browse/LUCENE-3233

 On Thu, Aug 4, 2011 at 7:24 PM, Arun Atreya my.2.pai...@gmail.com wrote:
  Hello,
 
  I would like to know the best way to load a huge synonym list into Solr.
 
  I would like to do concept indexing (a.k.a category indexing) with Solr.
 For
  example, I want to be able to index all cities and be able to search for
 all
  of them using a special keyword, say 'CONCEPTcity', where 'CONCEPTcity'
 will
  match anything that IS-A city, as specified in the index_synonyms.txt
 file. I
  believe the best way to do this is via the SynonymFilterFactory and do
  index-time synonym expansion. Or is there a better alternative?
 
  I would still like to keep the original city names and do not want to
  replace them with 'CONCEPTcity', so if someone searches for 'Lake', the
 city
  name 'Salt Lake City' still matches. Also, obviously, I do not want two
  different city names to be synonyms of each other.
 
  Is the correct way to specify the index_synonyms.txt file like this?
 
  -
  CONCEPTcity, Salt Lake City
  CONCEPTcity, New York
  CONCEPTcity, San Jose
  .
  .
  .
  -
 
  and then keep
  expand=true
  for SynonymFilterFactory?
 
  I tried to load a synonym file with 10K entries like this, and Solr/Jetty
  took a few seconds to start, but if I try to load a synonym file with 1M+
  entries, then it is taking a long time. What is the best way to do this?
 
  Thanks,
  Arun.
 



 --
 lucidimagination.com



Loading huge synonym list in Solr

2011-08-04 Thread Arun Atreya
Hello,

I would like to know the best way to load a huge synonym list into Solr.

I would like to do concept indexing (a.k.a category indexing) with Solr. For
example, I want to be able to index all cities and be able to search for all
of them using a special keyword, say 'CONCEPTcity', where 'CONCEPTcity' will
match anything that IS-A city, as specified in the index_synonyms.txt file. I
believe the best way to do this is via the SynonymFilterFactory and do
index-time synonym expansion. Or is there a better alternative?

I would still like to keep the original city names and do not want to
replace them with 'CONCEPTcity', so if someone searches for 'Lake', the city
name 'Salt Lake City' still matches. Also, obviously, I do not want two
different city names to be synonyms of each other.

Is the correct way to specify the index_synonyms.txt file like this?

-
CONCEPTcity, Salt Lake City
CONCEPTcity, New York
CONCEPTcity, San Jose
.
.
.
-

and then keep
expand=true
for SynonymFilterFactory?

I tried to load a synonym file with 10K entries like this, and Solr/Jetty
took a few seconds to start, but if I try to load a synonym file with 1M+
entries, then it is taking a long time. What is the best way to do this?

Thanks,
Arun.