Re: How to do custom sorting in Solr?

2012-06-11 Thread Afroz Ahmad
You may want to look at
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
While it is not the same requirement, this should give you an idea of how
to do custom sorting.

Thanks
Afroz

On Sun, Jun 10, 2012 at 4:43 PM, roz dev rozde...@gmail.com wrote:

 Yes, these documents have lots of unique values as the same product could
 be assigned to lots of other categories and that too, in a different sort
 order.

 We did some evaluation of heap usage and found that with kind of queries we
 generate, heap usage was going up to 24-26 GB. I could trace it to the fact
 that
 fieldCache is creating an array of 2M size for each of the sort fields.

 Since same products are mapped to multiple categories, we incur significant
 memory overhead. Therefore, any solve where memory consumption can be
 reduced is a good one for me.

 In fact, we have situations where same product is mapped to more than 1
 sub-category in the same category like


 Books
  -- Programming
  - Java in a nutshell
  -- Sale (40% off)
  - Java in a nutshell


 So,another thought in my mind is to somehow use second pass collector to
 group books appropriately in Programming and Sale categories, with right
 sort order.

 But, i have no clue about that piece :(

 -Saroj


 On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  2M docs is actually pretty small. Sorting is sensitive to the number
  of _unique_ values in the sort fields, not necessarily the number of
  documents.
 
  And sorting only works on fields with a single value (i.e. it can't have
  more than one token after analysis). So for each field you're only
 talking
  2M values at the vary maximum, assuming that the field in question has
  a unique value per document, which I doubt very much given your
  problem description.
 
  So with a corpus that size, I'd just try it'.
 
  Best
  Erick
 
  On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote:
   Thanks Erik for your quick feedback
  
   When Products are assigned to a category or Sub-Category then they can
 be
   in any order and price type can be regular or markdown.
   So, reg and markdown products are intermingled  as per their assignment
  but
   I want to sort them in such a way that we
   ensure that all the products which are on markdown are at the bottom of
  the
   list.
  
   I can use these multiple sorts but I realize that they are costly in
  terms
   of heap used, as they are using FieldCache.
  
   I have an index with 2M docs and docs are pretty big. So, I don't want
 to
   use them unless there is no other option.
  
   I am wondering if I can define a custom function query which can be
 like
   this:
  
  
 - check if product is on the markdown
 - if yes then change its sort order field to be the max value in the
 given sub-category, say 99
 - else, use the sort order of the product in the sub-category
  
   I have been looking at existing function queries but do not have a good
   handle on how to make one of my own.
  
   - Another option could be use a custom sort comparator but I am not
 sure
   about the way it works
  
   Any thoughts?
  
  
   -Saroj
  
  
  
  
   On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Skimming this, I two options come to mind:
  
   1 Simply apply primary, secondary, etc sorts. Something like
 sort=subcategory asc,markdown_or_regular desc,sort_order asc
  
   2 You could also use grouping to arrange things in groups and sort
  within
those groups. This has the advantage of returning some members
of each of the top N groups in the result set, which makes it
  easier
   to
get some of each group rather than having to analyze the whole
   list
  
   But your example is somewhat contradictory. You say
   products which are on markdown, are at
   the bottom of the documents list
  
   But in your examples, products on markdown are intermingled
  
   Best
   Erick
  
   On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
Hi All
   
   
I have an index which contains a Catalog of Products and
 Categories,
   with
Solr 4.0 from trunk
   
Data is organized like this:
   
Category: Books
   
Sub Category: Programming
   
Products:
   
Product # 1,  Price: Regular Sort Order:1
Product # 2,  Price: Markdown, Sort Order:2
Product # 3   Price: Regular, Sort Order:3
Product # 4   Price: Regular, Sort Order:4

.
...
Product # 100   Price: Regular, Sort Order:100
   
Sub Category: Fiction
   
Products:
   
Product # 1,  Price: Markdown, Sort Order:1
Product # 2,  Price: Regular, Sort Order:2
Product # 3   Price: Regular, Sort Order:3
Product # 4   Price: Markdown, Sort Order:4

.
...
Product # 70   Price: Regular, Sort Order:70
   
   
I want to query Solr and sort these products within each 

Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Hi All


 I have an index which contains a Catalog of Products and Categories, with
 Solr 4.0 from trunk

 Data is organized like this:

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:1
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:4
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 I want to query Solr and sort these products within each of the
 sub-category in a such a way that products which are on markdown, are at
 the bottom of the documents list and other products
 which are on regular price, are sorted as per their sort order in their
 sub-category.

 Expected Results are

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:101
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:71
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:71
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 My query is like this:

 q=*:*fq=category:Books

 What are the options to implement custom sorting and how do I do it?


- Define a Custom Function query?
- Define a Custom Comparator? Or,
- Define a Custom Collector?


 Please let me know the best way to go about it and any pointers to
 customize Solr 4.


Thanks
Saroj


Re: How to do custom sorting in Solr?

2012-06-10 Thread Erick Erickson
Skimming this, I two options come to mind:

1 Simply apply primary, secondary, etc sorts. Something like
   sort=subcategory asc,markdown_or_regular desc,sort_order asc

2 You could also use grouping to arrange things in groups and sort within
  those groups. This has the advantage of returning some members
  of each of the top N groups in the result set, which makes it easier to
  get some of each group rather than having to analyze the whole list

But your example is somewhat contradictory. You say
products which are on markdown, are at
the bottom of the documents list

But in your examples, products on markdown are intermingled

Best
Erick

On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
 Hi All


 I have an index which contains a Catalog of Products and Categories, with
 Solr 4.0 from trunk

 Data is organized like this:

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:1
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:4
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 I want to query Solr and sort these products within each of the
 sub-category in a such a way that products which are on markdown, are at
 the bottom of the documents list and other products
 which are on regular price, are sorted as per their sort order in their
 sub-category.

 Expected Results are

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:101
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:71
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:71
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 My query is like this:

 q=*:*fq=category:Books

 What are the options to implement custom sorting and how do I do it?


    - Define a Custom Function query?
    - Define a Custom Comparator? Or,
    - Define a Custom Collector?


 Please let me know the best way to go about it and any pointers to
 customize Solr 4.


 Thanks
 Saroj


Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Thanks Erik for your quick feedback

When Products are assigned to a category or Sub-Category then they can be
in any order and price type can be regular or markdown.
So, reg and markdown products are intermingled  as per their assignment but
I want to sort them in such a way that we
ensure that all the products which are on markdown are at the bottom of the
list.

I can use these multiple sorts but I realize that they are costly in terms
of heap used, as they are using FieldCache.

I have an index with 2M docs and docs are pretty big. So, I don't want to
use them unless there is no other option.

I am wondering if I can define a custom function query which can be like
this:


   - check if product is on the markdown
   - if yes then change its sort order field to be the max value in the
   given sub-category, say 99
   - else, use the sort order of the product in the sub-category

I have been looking at existing function queries but do not have a good
handle on how to make one of my own.

- Another option could be use a custom sort comparator but I am not sure
about the way it works

Any thoughts?


-Saroj




On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote:

 Skimming this, I two options come to mind:

 1 Simply apply primary, secondary, etc sorts. Something like
   sort=subcategory asc,markdown_or_regular desc,sort_order asc

 2 You could also use grouping to arrange things in groups and sort within
  those groups. This has the advantage of returning some members
  of each of the top N groups in the result set, which makes it easier
 to
  get some of each group rather than having to analyze the whole
 list

 But your example is somewhat contradictory. You say
 products which are on markdown, are at
 the bottom of the documents list

 But in your examples, products on markdown are intermingled

 Best
 Erick

 On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
  Hi All
 
 
  I have an index which contains a Catalog of Products and Categories,
 with
  Solr 4.0 from trunk
 
  Data is organized like this:
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:1
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:4
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  I want to query Solr and sort these products within each of the
  sub-category in a such a way that products which are on markdown, are at
  the bottom of the documents list and other products
  which are on regular price, are sorted as per their sort order in their
  sub-category.
 
  Expected Results are
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:101
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:71
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:71
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  My query is like this:
 
  q=*:*fq=category:Books
 
  What are the options to implement custom sorting and how do I do it?
 
 
 - Define a Custom Function query?
 - Define a Custom Comparator? Or,
 - Define a Custom Collector?
 
 
  Please let me know the best way to go about it and any pointers to
  customize Solr 4.
 
 
  Thanks
  Saroj



Re: How to do custom sorting in Solr?

2012-06-10 Thread Erick Erickson
2M docs is actually pretty small. Sorting is sensitive to the number
of _unique_ values in the sort fields, not necessarily the number of
documents.

And sorting only works on fields with a single value (i.e. it can't have
more than one token after analysis). So for each field you're only talking
2M values at the vary maximum, assuming that the field in question has
a unique value per document, which I doubt very much given your
problem description.

So with a corpus that size, I'd just try it'.

Best
Erick

On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote:
 Thanks Erik for your quick feedback

 When Products are assigned to a category or Sub-Category then they can be
 in any order and price type can be regular or markdown.
 So, reg and markdown products are intermingled  as per their assignment but
 I want to sort them in such a way that we
 ensure that all the products which are on markdown are at the bottom of the
 list.

 I can use these multiple sorts but I realize that they are costly in terms
 of heap used, as they are using FieldCache.

 I have an index with 2M docs and docs are pretty big. So, I don't want to
 use them unless there is no other option.

 I am wondering if I can define a custom function query which can be like
 this:


   - check if product is on the markdown
   - if yes then change its sort order field to be the max value in the
   given sub-category, say 99
   - else, use the sort order of the product in the sub-category

 I have been looking at existing function queries but do not have a good
 handle on how to make one of my own.

 - Another option could be use a custom sort comparator but I am not sure
 about the way it works

 Any thoughts?


 -Saroj




 On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Skimming this, I two options come to mind:

 1 Simply apply primary, secondary, etc sorts. Something like
   sort=subcategory asc,markdown_or_regular desc,sort_order asc

 2 You could also use grouping to arrange things in groups and sort within
      those groups. This has the advantage of returning some members
      of each of the top N groups in the result set, which makes it easier
 to
      get some of each group rather than having to analyze the whole
 list

 But your example is somewhat contradictory. You say
 products which are on markdown, are at
 the bottom of the documents list

 But in your examples, products on markdown are intermingled

 Best
 Erick

 On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
  Hi All
 
 
  I have an index which contains a Catalog of Products and Categories,
 with
  Solr 4.0 from trunk
 
  Data is organized like this:
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:1
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:4
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  I want to query Solr and sort these products within each of the
  sub-category in a such a way that products which are on markdown, are at
  the bottom of the documents list and other products
  which are on regular price, are sorted as per their sort order in their
  sub-category.
 
  Expected Results are
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:101
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:71
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:71
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  My query is like this:
 
  q=*:*fq=category:Books
 
  What are the options to implement custom sorting and how do I do it?
 
 
     - Define a Custom Function query?
     - Define a Custom Comparator? Or,
     - Define a Custom Collector?
 
 
  Please let me know the best way to go about it and any pointers to
  customize Solr 4.
 
 
  Thanks
  Saroj



Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Yes, these documents have lots of unique values as the same product could
be assigned to lots of other categories and that too, in a different sort
order.

We did some evaluation of heap usage and found that with kind of queries we
generate, heap usage was going up to 24-26 GB. I could trace it to the fact
that
fieldCache is creating an array of 2M size for each of the sort fields.

Since same products are mapped to multiple categories, we incur significant
memory overhead. Therefore, any solve where memory consumption can be
reduced is a good one for me.

In fact, we have situations where same product is mapped to more than 1
sub-category in the same category like


Books
 -- Programming
  - Java in a nutshell
 -- Sale (40% off)
  - Java in a nutshell


So,another thought in my mind is to somehow use second pass collector to
group books appropriately in Programming and Sale categories, with right
sort order.

But, i have no clue about that piece :(

-Saroj


On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.comwrote:

 2M docs is actually pretty small. Sorting is sensitive to the number
 of _unique_ values in the sort fields, not necessarily the number of
 documents.

 And sorting only works on fields with a single value (i.e. it can't have
 more than one token after analysis). So for each field you're only talking
 2M values at the vary maximum, assuming that the field in question has
 a unique value per document, which I doubt very much given your
 problem description.

 So with a corpus that size, I'd just try it'.

 Best
 Erick

 On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote:
  Thanks Erik for your quick feedback
 
  When Products are assigned to a category or Sub-Category then they can be
  in any order and price type can be regular or markdown.
  So, reg and markdown products are intermingled  as per their assignment
 but
  I want to sort them in such a way that we
  ensure that all the products which are on markdown are at the bottom of
 the
  list.
 
  I can use these multiple sorts but I realize that they are costly in
 terms
  of heap used, as they are using FieldCache.
 
  I have an index with 2M docs and docs are pretty big. So, I don't want to
  use them unless there is no other option.
 
  I am wondering if I can define a custom function query which can be like
  this:
 
 
- check if product is on the markdown
- if yes then change its sort order field to be the max value in the
given sub-category, say 99
- else, use the sort order of the product in the sub-category
 
  I have been looking at existing function queries but do not have a good
  handle on how to make one of my own.
 
  - Another option could be use a custom sort comparator but I am not sure
  about the way it works
 
  Any thoughts?
 
 
  -Saroj
 
 
 
 
  On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Skimming this, I two options come to mind:
 
  1 Simply apply primary, secondary, etc sorts. Something like
sort=subcategory asc,markdown_or_regular desc,sort_order asc
 
  2 You could also use grouping to arrange things in groups and sort
 within
   those groups. This has the advantage of returning some members
   of each of the top N groups in the result set, which makes it
 easier
  to
   get some of each group rather than having to analyze the whole
  list
 
  But your example is somewhat contradictory. You say
  products which are on markdown, are at
  the bottom of the documents list
 
  But in your examples, products on markdown are intermingled
 
  Best
  Erick
 
  On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
   Hi All
  
  
   I have an index which contains a Catalog of Products and Categories,
  with
   Solr 4.0 from trunk
  
   Data is organized like this:
  
   Category: Books
  
   Sub Category: Programming
  
   Products:
  
   Product # 1,  Price: Regular Sort Order:1
   Product # 2,  Price: Markdown, Sort Order:2
   Product # 3   Price: Regular, Sort Order:3
   Product # 4   Price: Regular, Sort Order:4
   
   .
   ...
   Product # 100   Price: Regular, Sort Order:100
  
   Sub Category: Fiction
  
   Products:
  
   Product # 1,  Price: Markdown, Sort Order:1
   Product # 2,  Price: Regular, Sort Order:2
   Product # 3   Price: Regular, Sort Order:3
   Product # 4   Price: Markdown, Sort Order:4
   
   .
   ...
   Product # 70   Price: Regular, Sort Order:70
  
  
   I want to query Solr and sort these products within each of the
   sub-category in a such a way that products which are on markdown,
 are at
   the bottom of the documents list and other products
   which are on regular price, are sorted as per their sort order in
 their
   sub-category.
  
   Expected Results are
  
   Category: Books
  
   Sub Category: Programming
  
   Products:
  
   Product # 1,  Price: Regular Sort Order:1
   Product # 2,  Price: Markdown, Sort Order:101
   Product