Re: How to do custom sorting in Solr?
You may want to look at http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html. While it is not the same requirement, this should give you an idea of how to do custom sorting. Thanks Afroz On Sun, Jun 10, 2012 at 4:43 PM, roz dev rozde...@gmail.com wrote: Yes, these documents have lots of unique values as the same product could be assigned to lots of other categories and that too, in a different sort order. We did some evaluation of heap usage and found that with kind of queries we generate, heap usage was going up to 24-26 GB. I could trace it to the fact that fieldCache is creating an array of 2M size for each of the sort fields. Since same products are mapped to multiple categories, we incur significant memory overhead. Therefore, any solve where memory consumption can be reduced is a good one for me. In fact, we have situations where same product is mapped to more than 1 sub-category in the same category like Books -- Programming - Java in a nutshell -- Sale (40% off) - Java in a nutshell So,another thought in my mind is to somehow use second pass collector to group books appropriately in Programming and Sale categories, with right sort order. But, i have no clue about that piece :( -Saroj On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.com wrote: 2M docs is actually pretty small. Sorting is sensitive to the number of _unique_ values in the sort fields, not necessarily the number of documents. And sorting only works on fields with a single value (i.e. it can't have more than one token after analysis). So for each field you're only talking 2M values at the vary maximum, assuming that the field in question has a unique value per document, which I doubt very much given your problem description. So with a corpus that size, I'd just try it'. Best Erick On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote: Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com wrote: Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each
Re: How to do custom sorting in Solr?
Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:71 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:71 . ... Product # 70 Price: Regular, Sort Order:70 My query is like this: q=*:*fq=category:Books What are the options to implement custom sorting and how do I do it? - Define a Custom Function query? - Define a Custom Comparator? Or, - Define a Custom Collector? Please let me know the best way to go about it and any pointers to customize Solr 4. Thanks Saroj
Re: How to do custom sorting in Solr?
Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:71 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:71 . ... Product # 70 Price: Regular, Sort Order:70 My query is like this: q=*:*fq=category:Books What are the options to implement custom sorting and how do I do it? - Define a Custom Function query? - Define a Custom Comparator? Or, - Define a Custom Collector? Please let me know the best way to go about it and any pointers to customize Solr 4. Thanks Saroj
Re: How to do custom sorting in Solr?
Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote: Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:71 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:71 . ... Product # 70 Price: Regular, Sort Order:70 My query is like this: q=*:*fq=category:Books What are the options to implement custom sorting and how do I do it? - Define a Custom Function query? - Define a Custom Comparator? Or, - Define a Custom Collector? Please let me know the best way to go about it and any pointers to customize Solr 4. Thanks Saroj
Re: How to do custom sorting in Solr?
2M docs is actually pretty small. Sorting is sensitive to the number of _unique_ values in the sort fields, not necessarily the number of documents. And sorting only works on fields with a single value (i.e. it can't have more than one token after analysis). So for each field you're only talking 2M values at the vary maximum, assuming that the field in question has a unique value per document, which I doubt very much given your problem description. So with a corpus that size, I'd just try it'. Best Erick On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote: Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote: Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:71 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:71 . ... Product # 70 Price: Regular, Sort Order:70 My query is like this: q=*:*fq=category:Books What are the options to implement custom sorting and how do I do it? - Define a Custom Function query? - Define a Custom Comparator? Or, - Define a Custom Collector? Please let me know the best way to go about it and any pointers to customize Solr 4. Thanks Saroj
Re: How to do custom sorting in Solr?
Yes, these documents have lots of unique values as the same product could be assigned to lots of other categories and that too, in a different sort order. We did some evaluation of heap usage and found that with kind of queries we generate, heap usage was going up to 24-26 GB. I could trace it to the fact that fieldCache is creating an array of 2M size for each of the sort fields. Since same products are mapped to multiple categories, we incur significant memory overhead. Therefore, any solve where memory consumption can be reduced is a good one for me. In fact, we have situations where same product is mapped to more than 1 sub-category in the same category like Books -- Programming - Java in a nutshell -- Sale (40% off) - Java in a nutshell So,another thought in my mind is to somehow use second pass collector to group books appropriately in Programming and Sale categories, with right sort order. But, i have no clue about that piece :( -Saroj On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.comwrote: 2M docs is actually pretty small. Sorting is sensitive to the number of _unique_ values in the sort fields, not necessarily the number of documents. And sorting only works on fields with a single value (i.e. it can't have more than one token after analysis). So for each field you're only talking 2M values at the vary maximum, assuming that the field in question has a unique value per document, which I doubt very much given your problem description. So with a corpus that size, I'd just try it'. Best Erick On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote: Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com wrote: Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product