Question regarding snapinstaller

2009-11-02 Thread Prasanna Ranganathan

 It looks like the snapinstaller script does an atomic remove and replace of
the entire solr_home/data_dir/index folder with the contents of the new
snapshot before issuing a commit command. I am trying to understand the
implication of the same.

 What happens to queries that come during the time interval between the
instant the existing directory is removed and the commit command gets
finalized? Does a currently running instance of Solr not need the files in
the index folder to serve the query results? Are all the contents of the
index folder loaded into memory?
 
 Thanks in advance for any help.

Regards,

Prasanna.


Re: Question about PatternReplace filter and automatic Synonym generation

2009-10-07 Thread Prasanna Ranganathan


On 10/6/09 3:32 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 :  I ll try to explain with an example. Given the term 'it!' in the title, it
 : should match both 'it' and 'it!' in the query as an exact match. Currently,
 : this is done by using a synonym entry  (and index time SynonymFilter) as
 : follows:
 : 
 :  it! = it, it!
 : 
 :  Now, the above holds true for all cases where you have a title token of the
 : form [aA-zZ]*!. Handling all of those cases requires adding synonyms
 : manually for each case which is not easy to manage and does not scale.
 : 
 :  I am hoping to do the same by using a index time filter that takes in a
 : pattern like the PatternReplace filter and adds the newly created token
 : instead of replacing the original one. Does this make sense? Am I missing
 : something that would break this approach?
 
 something like this would be fairly easy to implement in Lucene, but
 somewhat confusing to try and configure in Solr.  I was going to suggest
 that you use something like...
  filter class=solr.PatternReplaceFilterFactory
 pattern=(^.*)\!?$) replacement=$1 $2 replace=all /
 
 ..and then have a subsequent filter that splits the tokens on the
 whitespace (or any other special character you could use in the
 replacement) ... but aparently we don't have any built in filters that
 will just split tokens on a character/pattern for you.  that would also be
 fairly easy to write if someone wnats to submit a patch.

 There is a Solr.PatternTokenizerFactory class which likely fits the bill in
this case. The related question I have is this - is it possible to have
multiple Tokenizers in your analysis chain?

Prasanna.



Re: Question about PatternReplace filter and automatic Synonym generation

2009-10-05 Thread Prasanna Ranganathan

 Can someone please give me some pointers to the questions in my earlier
email? And and every help is much appreciated.

Regards,

Prasanna.


On 10/2/09 11:01 AM, Prasanna Ranganathan pranganat...@netflix.com
wrote:

 
  Does the PatternReplaceFilter have an option where you can keep the original
 token in addition to the modified token? From what I looked at it does not
 seem to but I want to confirm the same.
 
 Alternatively, is there a filter available which takes in a pattern and
 produces additional forms of the token depending on the pattern? The use case
 I am looking at here is using such a filter to automate synonym generation. In
 our application, quite a few of the synonym file entries match a specific
 pattern and having such a filter would make it easier I believe. Pl. do
 correct me in case I am missing some unwanted side-effect with this approach.
 
 Continuing on that line, what is the performance hit in having additional
 index-time filters as opposed to using a synonym file with more entries? How
 does the overhead of using a bigger synonym file as opposed to additional
 filters compare?
 
 Thanks in advance for the help.
 
 Regards,
 
 Prasanna.



Re: Question about PatternReplace filter and automatic Synonym generation

2009-10-05 Thread Prasanna Ranganathan

I just saw the reply from Shalin after sending this email. Kindly excuse.


On 10/5/09 5:17 PM, Prasanna Ranganathan pranganat...@netflix.com wrote:

 
  Can someone please give me some pointers to the questions in my earlier
 email? And and every help is much appreciated.
 
 Regards,
 
 Prasanna.
 
 
 On 10/2/09 11:01 AM, Prasanna Ranganathan pranganat...@netflix.com wrote:
 
 
  Does the PatternReplaceFilter have an option where you can keep the original
 token in addition to the modified token? From what I looked at it does not
 seem to but I want to confirm the same.
 
 Alternatively, is there a filter available which takes in a pattern and
 produces additional forms of the token depending on the pattern? The use case
 I am looking at here is using such a filter to automate synonym generation.
 In our application, quite a few of the synonym file entries match a specific
 pattern and having such a filter would make it easier I believe. Pl. do
 correct me in case I am missing some unwanted side-effect with this approach.
 
 Continuing on that line, what is the performance hit in having additional
 index-time filters as opposed to using a synonym file with more entries? How
 does the overhead of using a bigger synonym file as opposed to additional
 filters compare?
 
 Thanks in advance for the help.
 
 Regards,
 
 Prasanna.



Re: Question about PatternReplace filter and automatic Synonym generation

2009-10-05 Thread Prasanna Ranganathan



On 10/5/09 2:46 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote:

 Alternatively, is there a filter available which takes in a pattern and
 produces additional forms of the token depending on the pattern? The use
 case I am looking at here is using such a filter to automate synonym
 generation. In our application, quite a few of the synonym file entries
 match a specific pattern and having such a filter would make it easier I
 believe. Pl. do correct me in case I am missing some unwanted side-effect
 with this approach.
 
 
 I do not understand this. TokenFilters are used for things like stemming,
 replacing patterns, lowercasing, n-gramming etc. The synonym filter inserts
 additional tokens (synonyms) from a file for each token.
 
 What exactly are you trying to do with synonyms? I guess you could do
 stemming etc with synonyms but why do you want to do that?
 
 I ll try to explain with an example. Given the term 'it!' in the title, it
should match both 'it' and 'it!' in the query as an exact match. Currently,
this is done by using a synonym entry  (and index time SynonymFilter) as
follows:

 it! = it, it!

 Now, the above holds true for all cases where you have a title token of the
form [aA-zZ]*!. Handling all of those cases requires adding synonyms
manually for each case which is not easy to manage and does not scale.

 I am hoping to do the same by using a index time filter that takes in a
pattern like the PatternReplace filter and adds the newly created token
instead of replacing the original one. Does this make sense? Am I missing
something that would break this approach?

 
 Note that a change in synonym file needs a re-index of the affected
 documents. Also, the synonym map is kept in memory.

 What is the overhead incurred in having an additional filter applied during
indexing? It is strictly CPU only?

 Thanks a lot for your valuable input.

Regards,

Prasanna.



Re: Question about PatternReplace filter and automatic Synonym generation

2009-10-05 Thread Prasanna Ranganathan

On 10/5/09 8:59 PM, Christian Zambrano czamb...@gmail.com wrote:

 
 Wouldn't it be better to use built-in token filters at both index and
 query that will convert 'it!' to just 'it'? I believe the
 WorkDelimeterFilterFactory will do that for you.
 

 We do have a field that uses WordDelimiterFilter but it also uses a Stemmer
and Stopword filter. That field is used for a stemmed match with a nominal
boost. However, the field I am talking about is for an exact match (only
lowercase and synonym filter) with a higher boost than the field with the
WordDelimiterFilter.

Prasanna.



Question about PatternReplace filter and automatic Synonym generation

2009-10-02 Thread Prasanna Ranganathan

 Does the PatternReplaceFilter have an option where you can keep the
original token in addition to the modified token? From what I looked at it
does not seem to but I want to confirm the same.

Alternatively, is there a filter available which takes in a pattern and
produces additional forms of the token depending on the pattern? The use
case I am looking at here is using such a filter to automate synonym
generation. In our application, quite a few of the synonym file entries
match a specific pattern and having such a filter would make it easier I
believe. Pl. do correct me in case I am missing some unwanted side-effect
with this approach.

Continuing on that line, what is the performance hit in having additional
index-time filters as opposed to using a synonym file with more entries? How
does the overhead of using a bigger synonym file as opposed to additional
filters compare?

Thanks in advance for the help.

Regards,

Prasanna.


Effect of SynonymFilter on Solr document fields

2009-09-16 Thread Prasanna Ranganathan
Hi,

 I am a newbie to Solr and request you all to kindly excuse any rookie
mistakes.

 I have the following questions:

We use the Synonym Filter on one of the indexed fields. It is specified in
the schema.xml as one of the filters (for the analyzer type index) for that
field. I believe that this means any tokens which match an entry in the
provided synonym file will have all the forms indexed provided
expanded=true. I am able to verify that by using the Solr admin analysis
tool. However when I use Luke to examine a document in the index which would
have synonyms for that particular field, I see only the original value and
do not see the additional forms that should be added due to the synonym
match for the field in question. I am not sure if I am missing something
here. How do I verify the same?

Another related question ­ The field in question here is not specified as
multivalued. However, as I understand it a synonym match will mean multiple
values for that field. I was not able to find any documentation that
explains this in detail and would like to know how this particular case
impacts the indexing of that field, scoring, etc. How does the behavior of a
field having multiple values due to SynonymFilter compare and contrast with
the multivalued=true|false flag. What would a synonym match expansion for a
field with multivalued=false mean?

Prasanna.