Re: Implementing Autocomplete/Query Suggest using Solr

2010-01-04 Thread Shalin Shekhar Mangar
On Wed, Dec 30, 2009 at 3:07 AM, Prasanna R plistma...@gmail.com wrote:

  I looked into the Solr/Lucene classes and found the required information.
 Am summarizing the same for the benefit of those that might refer to this
 thread in the future.

  The change I had to make was very simple - make a call to getPrefixQuery
 instead of getWildcardQuery in my custom-modified Solr dismax query parser
 class. However, this will make a fairly significant difference in terms of
 efficiency. The key difference between the lucene WildcardQuery and
 PrefixQuery lies in their respective term enumerators, specifically in the
 term comparators. The termCompare method for PrefixQuery is more
 light-weight than that of WildcardQuery and is essentially an optimization
 given that a prefix query is nothing but a specialized case of Wildcard
 query. Also, this is why the lucene query parser automatically creates a
 PrefixQuery for query terms of the form 'foo*' instead of a WildcardQuery.


I don't understand this. There is nothing that one should need to do in
Solr's code to make this work. Prefix queries are supported out of the box
in Solr.


 And one final request for Comment to Shalin on this topic - I am guessing
 you ensured there were no duplicate terms in the field(s) used for
 autocompletion. For our first version, I am thinking of eliminating the
 duplicates outside of the results handler that gives suggestions since
 duplicate suggestions originate only from different document IDs in our
 system and we do want the list of document IDs matched. Is there a
 better/different way of doing the same?


No, I guess not.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Implementing Autocomplete/Query Suggest using Solr

2010-01-04 Thread Prasanna R
On Mon, Jan 4, 2010 at 1:20 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Dec 30, 2009 at 3:07 AM, Prasanna R plistma...@gmail.com wrote:

   I looked into the Solr/Lucene classes and found the required
 information.
  Am summarizing the same for the benefit of those that might refer to this
  thread in the future.
 
   The change I had to make was very simple - make a call to getPrefixQuery
  instead of getWildcardQuery in my custom-modified Solr dismax query
 parser
  class. However, this will make a fairly significant difference in terms
 of
  efficiency. The key difference between the lucene WildcardQuery and
  PrefixQuery lies in their respective term enumerators, specifically in
 the
  term comparators. The termCompare method for PrefixQuery is more
  light-weight than that of WildcardQuery and is essentially an
 optimization
  given that a prefix query is nothing but a specialized case of Wildcard
  query. Also, this is why the lucene query parser automatically creates a
  PrefixQuery for query terms of the form 'foo*' instead of a
 WildcardQuery.
 
 
 I don't understand this. There is nothing that one should need to do in
 Solr's code to make this work. Prefix queries are supported out of the box
 in Solr.

  I  am using the dismax query parser and I match on multiple fields with
different boosts. I run a prefix query on some fields in combination with a
regular field query on other fields. I do not know of any way in which one
could specify a prefix query on a particular field in your dismax query out
of the box in Solr 1.4. I had to update Solr to support additional syntax in
a dismax query that lets you choose to create a prefix query on a particular
field. As part of parsing this custom syntax, I was making a call to the
getWildcardQuery which I simply changed to getPrefixQuery.

Prasanna.


Re: Implementing Autocomplete/Query Suggest using Solr

2009-12-29 Thread Prasanna R
 
   We do auto-complete through prefix searches on shingles.
  
 
  Just to confirm, do you mean using EdgeNgram filter to produce letter
  ngrams
  of the tokens in the chosen field?
 
 

 No, I'm talking about prefix search on tokens produced by a ShingleFilter.


 I did not know about the Prefix query parser in Solr. Thanks a lot for
 pointing out the same.

 I find relatively little online material about the Solr/Lucene prefix query
 parser. Kindly point me to any useful resource that I might be missing.


 I looked into the Solr/Lucene classes and found the required information.
Am summarizing the same for the benefit of those that might refer to this
thread in the future.

 The change I had to make was very simple - make a call to getPrefixQuery
instead of getWildcardQuery in my custom-modified Solr dismax query parser
class. However, this will make a fairly significant difference in terms of
efficiency. The key difference between the lucene WildcardQuery and
PrefixQuery lies in their respective term enumerators, specifically in the
term comparators. The termCompare method for PrefixQuery is more
light-weight than that of WildcardQuery and is essentially an optimization
given that a prefix query is nothing but a specialized case of Wildcard
query. Also, this is why the lucene query parser automatically creates a
PrefixQuery for query terms of the form 'foo*' instead of a WildcardQuery.

A big thank you to Shalin for providing valuable guidance and insight.

And one final request for Comment to Shalin on this topic - I am guessing
you ensured there were no duplicate terms in the field(s) used for
autocompletion. For our first version, I am thinking of eliminating the
duplicates outside of the results handler that gives suggestions since
duplicate suggestions originate only from different document IDs in our
system and we do want the list of document IDs matched. Is there a
better/different way of doing the same?

Regards,

Prasanna.


Re: Implementing Autocomplete/Query Suggest using Solr

2009-12-28 Thread Prasanna R
On Wed, Dec 23, 2009 at 10:52 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Thu, Dec 24, 2009 at 2:39 AM, Prasanna R plistma...@gmail.com wrote:

  On Tue, Dec 22, 2009 at 11:49 PM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  
 I am curious how an approach that simply uses the wildcard query
functionality on an indexed field would work.
  
  
   It works fine as long as the terms are not repeated across documents.
  
  
   I do not follow why terms repeating across documents would be an issue.
 As
  long as you can differentiate between multiple matches and rank them
  properly it should work right?
 
 
 A prefix search would return documents. If a field X being used for
 auto-complete has the same value in two documents then the user will see
 the
 same value being suggested twice.


 That is right. I will have to handle removing duplicate values from the
results returned by the result handler.


  We do auto-complete through prefix searches on shingles.
 

 Just to confirm, do you mean using EdgeNgram filter to produce letter
 ngrams
 of the tokens in the chosen field?



 No, I'm talking about prefix search on tokens produced by a ShingleFilter.


I did not know about the Prefix query parser in Solr. Thanks a lot for
pointing out the same.

I find relatively little online material about the Solr/Lucene prefix query
parser. Kindly point me to any useful resource that I might be missing.

Thanks again for all your help.

Regards,

Prasanna.


RE: Implementing Autocomplete/Query Suggest using Solr

2009-12-23 Thread Ankit Bhatnagar
In addition to what Shalin said, you could use the TermsComponent.
However you will be better off using the Dismax request handler 

Ankit

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, December 23, 2009 2:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Implementing Autocomplete/Query Suggest using Solr

On Wed, Dec 23, 2009 at 6:14 AM, Prasanna R plistma...@gmail.com wrote:


  I am curious how an approach that simply uses the wildcard query
 functionality on an indexed field would work.


It works fine as long as the terms are not repeated across documents.


 While Solr does not support
 wildcard queries out of the box currently, it will definitely be included
 in
 the future and I believe the edismax parser already lets you do that.


Solr supports prefix queries and there's a reverse wild card filter in trunk
too.

We do auto-complete through prefix searches on shingles.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Implementing Autocomplete/Query Suggest using Solr

2009-12-23 Thread Prasanna R
On Tue, Dec 22, 2009 at 11:49 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


   I am curious how an approach that simply uses the wildcard query
  functionality on an indexed field would work.


 It works fine as long as the terms are not repeated across documents.


 I do not follow why terms repeating across documents would be an issue. As
long as you can differentiate between multiple matches and rank them
properly it should work right?



  While Solr does not support
  wildcard queries out of the box currently, it will definitely be included
  in
  the future and I believe the edismax parser already lets you do that.


 Solr supports prefix queries and there's a reverse wild card filter in
 trunk
 too.


Are you referring to facet prefix queries as prefix queries? I looked at
reversed wild card filter but think that the regular wild card matching as
opposed to leading wild card matching is better suited for an
auto-completion feature.


 We do auto-complete through prefix searches on shingles.


Just to confirm, do you mean using EdgeNgram filter to produce letter ngrams
of the tokens in the chosen field?

Assuming the regular wild card query would also work, any thoughts on how it
compares to the EdgeNGram approach in terms of added indexing cost,
performance, etc.?

Thanks a lot for your valuable inputs/comments.

Prasanna.


Re: Implementing Autocomplete/Query Suggest using Solr

2009-12-23 Thread Shalin Shekhar Mangar
On Thu, Dec 24, 2009 at 2:39 AM, Prasanna R plistma...@gmail.com wrote:

 On Tue, Dec 22, 2009 at 11:49 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 
I am curious how an approach that simply uses the wildcard query
   functionality on an indexed field would work.
 
 
  It works fine as long as the terms are not repeated across documents.
 
 
  I do not follow why terms repeating across documents would be an issue. As
 long as you can differentiate between multiple matches and rank them
 properly it should work right?


A prefix search would return documents. If a field X being used for
auto-complete has the same value in two documents then the user will see the
same value being suggested twice.



 
   While Solr does not support
   wildcard queries out of the box currently, it will definitely be
 included
   in
   the future and I believe the edismax parser already lets you do that.
 
 
  Solr supports prefix queries and there's a reverse wild card filter in
  trunk
  too.
 

 Are you referring to facet prefix queries as prefix queries? I looked at
 reversed wild card filter but think that the regular wild card matching as
 opposed to leading wild card matching is better suited for an
 auto-completion feature.


No, I'm talking about regular prefix search e.g. field:val*



  We do auto-complete through prefix searches on shingles.
 

 Just to confirm, do you mean using EdgeNgram filter to produce letter
 ngrams
 of the tokens in the chosen field?


No, I'm talking about prefix search on tokens produced by a ShingleFilter.


 Assuming the regular wild card query would also work, any thoughts on how
 it
 compares to the EdgeNGram approach in terms of added indexing cost,
 performance, etc.?


With EdgeNGram, you can do phrase (exact) matches which are faster. But if
you have a big corpus of terms then EdgeNGramFilter can produce too many
tokens. In some places we are using phrase search on n-gram, in other places
(with more terms) we opted for prefix search on shingles.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Implementing Autocomplete/Query Suggest using Solr

2009-12-22 Thread Shalin Shekhar Mangar
On Wed, Dec 23, 2009 at 6:14 AM, Prasanna R plistma...@gmail.com wrote:


  I am curious how an approach that simply uses the wildcard query
 functionality on an indexed field would work.


It works fine as long as the terms are not repeated across documents.


 While Solr does not support
 wildcard queries out of the box currently, it will definitely be included
 in
 the future and I believe the edismax parser already lets you do that.


Solr supports prefix queries and there's a reverse wild card filter in trunk
too.

We do auto-complete through prefix searches on shingles.

-- 
Regards,
Shalin Shekhar Mangar.