Re: catchall fields or multiple fields
Thanks for your suggestion Jack. In fact we're doing geographic search (fields are country, state, county, town, hamlet, district) So it's difficult to split. Best regards, Elisabeth 2015-10-13 16:01 GMT+02:00 Jack Krupansky: > Performing a sequence of queries can help too. For example, if users > commonly search for a product name, you could do an initial query on just > the product name field which should be much faster than searching the text > of all product descriptions, and highlighting would be less problematic. If > that initial query comes up empty, then you could move on to the next > highest most likely field, maybe product title (short one line > description), and query voluminous fields like detailed product > descriptions, specifications, and user comments/reviews only as a last > resort. > > -- Jack Krupansky > > On Tue, Oct 13, 2015 at 6:17 AM, elisabeth benoit < > elisaelisael...@gmail.com > > wrote: > > > Thanks to you all for those informed advices. > > > > Thanks Trey for your very detailed point of view. This is now very clear > to > > me how a search on multiple fields can grow slower than a search on a > > catchall field. > > > > Our actual search model is problematic: we search on a catchall field, > but > > need to know which fields match, so we do highlighting on multi fields > (not > > indexed, but stored). To improve performance, we want to get rid of > > highlighting and use the solr explain output. To get the explain output > on > > those fields, we need to do a search on those fields. > > > > So I guess we have to test if removing highlighting and adding multi > fields > > search will improve performances or not. > > > > Best regards, > > Elisabeth > > > > > > > > 2015-10-12 17:55 GMT+02:00 Jack Krupansky : > > > > > I think it may all depend on the nature of your application and how > much > > > commonality there is between fields. > > > > > > One interesting area is auto-suggest, where you can certainly suggest > > from > > > the union of all fields, you may want to give priority to suggestions > > from > > > preferred fields. For example, for actual product names or important > > > keywords rather than random words from the English language that happen > > to > > > occur in descriptions, all of which would occur in a catchall. > > > > > > -- Jack Krupansky > > > > > > On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit < > > > elisaelisael...@gmail.com > > > > wrote: > > > > > > > Hello, > > > > > > > > We're using solr 4.10 and storing all data in a catchall field. It > > seems > > > to > > > > me that one good reason for using a catchall field is when using > > scoring > > > > with idf (with idf, a word might not have same score in all fields). > We > > > got > > > > rid of idf and are now considering using multiple fields. I remember > > > > reading somewhere that using a catchall field might speed up > searching > > > > time. I was wondering if some of you have any opinion (or experience) > > > > related to this subject. > > > > > > > > Best regards, > > > > Elisabeth > > > > > > > > > >
Re: catchall fields or multiple fields
Thanks to you all for those informed advices. Thanks Trey for your very detailed point of view. This is now very clear to me how a search on multiple fields can grow slower than a search on a catchall field. Our actual search model is problematic: we search on a catchall field, but need to know which fields match, so we do highlighting on multi fields (not indexed, but stored). To improve performance, we want to get rid of highlighting and use the solr explain output. To get the explain output on those fields, we need to do a search on those fields. So I guess we have to test if removing highlighting and adding multi fields search will improve performances or not. Best regards, Elisabeth 2015-10-12 17:55 GMT+02:00 Jack Krupansky: > I think it may all depend on the nature of your application and how much > commonality there is between fields. > > One interesting area is auto-suggest, where you can certainly suggest from > the union of all fields, you may want to give priority to suggestions from > preferred fields. For example, for actual product names or important > keywords rather than random words from the English language that happen to > occur in descriptions, all of which would occur in a catchall. > > -- Jack Krupansky > > On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit < > elisaelisael...@gmail.com > > wrote: > > > Hello, > > > > We're using solr 4.10 and storing all data in a catchall field. It seems > to > > me that one good reason for using a catchall field is when using scoring > > with idf (with idf, a word might not have same score in all fields). We > got > > rid of idf and are now considering using multiple fields. I remember > > reading somewhere that using a catchall field might speed up searching > > time. I was wondering if some of you have any opinion (or experience) > > related to this subject. > > > > Best regards, > > Elisabeth > > >
Re: catchall fields or multiple fields
Performing a sequence of queries can help too. For example, if users commonly search for a product name, you could do an initial query on just the product name field which should be much faster than searching the text of all product descriptions, and highlighting would be less problematic. If that initial query comes up empty, then you could move on to the next highest most likely field, maybe product title (short one line description), and query voluminous fields like detailed product descriptions, specifications, and user comments/reviews only as a last resort. -- Jack Krupansky On Tue, Oct 13, 2015 at 6:17 AM, elisabeth benoitwrote: > Thanks to you all for those informed advices. > > Thanks Trey for your very detailed point of view. This is now very clear to > me how a search on multiple fields can grow slower than a search on a > catchall field. > > Our actual search model is problematic: we search on a catchall field, but > need to know which fields match, so we do highlighting on multi fields (not > indexed, but stored). To improve performance, we want to get rid of > highlighting and use the solr explain output. To get the explain output on > those fields, we need to do a search on those fields. > > So I guess we have to test if removing highlighting and adding multi fields > search will improve performances or not. > > Best regards, > Elisabeth > > > > 2015-10-12 17:55 GMT+02:00 Jack Krupansky : > > > I think it may all depend on the nature of your application and how much > > commonality there is between fields. > > > > One interesting area is auto-suggest, where you can certainly suggest > from > > the union of all fields, you may want to give priority to suggestions > from > > preferred fields. For example, for actual product names or important > > keywords rather than random words from the English language that happen > to > > occur in descriptions, all of which would occur in a catchall. > > > > -- Jack Krupansky > > > > On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit < > > elisaelisael...@gmail.com > > > wrote: > > > > > Hello, > > > > > > We're using solr 4.10 and storing all data in a catchall field. It > seems > > to > > > me that one good reason for using a catchall field is when using > scoring > > > with idf (with idf, a word might not have same score in all fields). We > > got > > > rid of idf and are now considering using multiple fields. I remember > > > reading somewhere that using a catchall field might speed up searching > > > time. I was wondering if some of you have any opinion (or experience) > > > related to this subject. > > > > > > Best regards, > > > Elisabeth > > > > > >
Re: catchall fields or multiple fields
I think it may all depend on the nature of your application and how much commonality there is between fields. One interesting area is auto-suggest, where you can certainly suggest from the union of all fields, you may want to give priority to suggestions from preferred fields. For example, for actual product names or important keywords rather than random words from the English language that happen to occur in descriptions, all of which would occur in a catchall. -- Jack Krupansky On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoitwrote: > Hello, > > We're using solr 4.10 and storing all data in a catchall field. It seems to > me that one good reason for using a catchall field is when using scoring > with idf (with idf, a word might not have same score in all fields). We got > rid of idf and are now considering using multiple fields. I remember > reading somewhere that using a catchall field might speed up searching > time. I was wondering if some of you have any opinion (or experience) > related to this subject. > > Best regards, > Elisabeth >
Re: catchall fields or multiple fields
Elisabeth, Yes, it will almost always be more efficient to search within a catch-all field than to search across multiple fields. Think of it this way: when you search on a single field, you are doing a single keyword search against the index per term. When you search across multiple fields, you are executing the search for that term multiple times (once for each field) against the index, and then doing the necessary intersections/unions/etc. of the document sets. As you continue to add more and more fields to search across, the search continues to grow slower. If you're only searching a few fields then it will probably not be noticeably slower, but the more and more you add, the slower your response times will become. This slowdown may be measured in milliseconds, in which case you may not care, but it will be slower. The idf point you mentioned can be both a pro and a con depending upon the use case. For example, if you are searching news content that has a "french_text" field and an "english_text" field, it would be suboptimal if for the search "Barack Obama" you got only French documents at the top because the US president's name is much more commonly found in English documents. When you're searching fields with different types of content, however, you might find examples where you'd actually want idf differences maintained and documents differentiated based upon underlying field. One particularly nice thing about the multi-field approach is that it is very easy to apply different boosts to the fields and to dynamically change the boosts. You can similarly do this with payloads within a catch-all field. You could even assign each term a payload corresponding to which field the content came from, and then dynamically change the boosts associated with those payloads at query time (caveat - custom code required). See this blog post for an end-to-end payload scoring example, https://lucidworks.com/blog/2014/06/13/end-to-end-payload-example-in-solr/. Sharing my personal experience: at CareerBuilder, we use the catch-all field with payloads (one per underlying field) that we can dynamically change the weight of at query time. We found that for most of our corpus sizes (ranging between 2 and 100 million full text jobs or resumes), that is is more efficient to search between 1 and 3 fields than to do the multi-field search with payload scoring, but once we get to the 4th field the extra cost associated with the payload scoring was overtaken by the additional time required to search each additional field. These numbers (3 vs 4 fields, etc.) are all anecdotal, of course, as it is dependent upon a lot of environmental and corpus factors unique to our use case. The main point of this approach, however, is that there is no additional cost per-field beyond the upfront cost to add and score payloads, so we have been able to easily represent over a hundred of these payload-based "virtual fields" with different weights within a catch-all field (all with a fixed query-time cost). *In summary*: yes, you should expect a performance decline as you add more and more fields to your query if you are searching across multiple fields. You can overcome this by using a single catch-all field if you are okay losing IDF per-field (you'll still have it globally across all fields). If you want to use a catch-all field, but still want to boost content based upon the field it originated within, you can accomplish this with payloads. All the best, Trey Grainger Co-author, Solr in Action Director of Engineering, Search & Recommendations @ CareerBuilder On Mon, Oct 12, 2015 at 9:12 AM, Ahmet Arslanwrote: > Hi, > > Catch-all field: No need to worry about how to aggregate scores coming > from different fields. > But you cannot utilize different analysers for different fields. > > Multiple-fields: You can play with edismax's parameters on-the-fly, > without having to re-index. > It is flexible that you can include/exclude fields from search. > > Ahmet > > > > On Monday, October 12, 2015 3:39 PM, elisabeth benoit < > elisaelisael...@gmail.com> wrote: > Hello, > > We're using solr 4.10 and storing all data in a catchall field. It seems to > me that one good reason for using a catchall field is when using scoring > with idf (with idf, a word might not have same score in all fields). We got > rid of idf and are now considering using multiple fields. I remember > reading somewhere that using a catchall field might speed up searching > time. I was wondering if some of you have any opinion (or experience) > related to this subject. > > Best regards, > Elisabeth >
catchall fields or multiple fields
Hello, We're using solr 4.10 and storing all data in a catchall field. It seems to me that one good reason for using a catchall field is when using scoring with idf (with idf, a word might not have same score in all fields). We got rid of idf and are now considering using multiple fields. I remember reading somewhere that using a catchall field might speed up searching time. I was wondering if some of you have any opinion (or experience) related to this subject. Best regards, Elisabeth
Re: catchall fields or multiple fields
Hi, Catch-all field: No need to worry about how to aggregate scores coming from different fields. But you cannot utilize different analysers for different fields. Multiple-fields: You can play with edismax's parameters on-the-fly, without having to re-index. It is flexible that you can include/exclude fields from search. Ahmet On Monday, October 12, 2015 3:39 PM, elisabeth benoitwrote: Hello, We're using solr 4.10 and storing all data in a catchall field. It seems to me that one good reason for using a catchall field is when using scoring with idf (with idf, a word might not have same score in all fields). We got rid of idf and are now considering using multiple fields. I remember reading somewhere that using a catchall field might speed up searching time. I was wondering if some of you have any opinion (or experience) related to this subject. Best regards, Elisabeth
Re: catchall fields or multiple fields
Why get rid of idf? Most often, idf is a big help in relevance. I’ve used different weights for different parts of the document, like weighting the title 8X the body. I’ve used different weights for different analysis chains. If we have three fields, one lowercased, one stemmed, and one a phonetic representation, then you can weight the lower case higher than the stemmed field, and stemmed higher than phonetic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 12, 2015, at 6:12 AM, Ahmet Arslanwrote: > > Hi, > > Catch-all field: No need to worry about how to aggregate scores coming from > different fields. > But you cannot utilize different analysers for different fields. > > Multiple-fields: You can play with edismax's parameters on-the-fly, without > having to re-index. > It is flexible that you can include/exclude fields from search. > > Ahmet > > > > On Monday, October 12, 2015 3:39 PM, elisabeth benoit > wrote: > Hello, > > We're using solr 4.10 and storing all data in a catchall field. It seems to > me that one good reason for using a catchall field is when using scoring > with idf (with idf, a word might not have same score in all fields). We got > rid of idf and are now considering using multiple fields. I remember > reading somewhere that using a catchall field might speed up searching > time. I was wondering if some of you have any opinion (or experience) > related to this subject. > > Best regards, > Elisabeth