AW: Use function return value for range queries

2013-10-04 Thread Sandro Zbinden
Thanks for the quick answer.  I thought that :-)

Is there any plan add such a functionality in the future. Or is it completely 
against the concept.

Bests Sandro


-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Freitag, 4. Oktober 2013 16:41
An: solr-user@lucene.apache.org
Betreff: Re: Use function return value for range queries

I think the best you can do is compute sum(pricea,priceb) at index time as a 
third field, say priceSum, and then you can do a range query on that priceSum 
field.

It would be nice to be able to have a query that evaluates arbitrary 
expressions combining field values, but there is no such feature in Lucene at 
this time. FunctionQuery modifies the document score, but doesn't affect which 
documents are selected.

Function queries can be used to modify document scores and to return values, 
but not in the query itself to select documents.

-- Jack Krupansky

-Original Message-
From: SandroZbinden
Sent: Friday, October 04, 2013 10:21 AM
To: solr-user@lucene.apache.org
Subject: Use function return value for range queries

Is there a way to use the function return value for a range query

For example: I have two price fields pricea and priceb and now i want to get 
the values where the sum of the pricea and priceb is between  [0 TO 5]

Something like *select?q={!func}sum(pricea,priceb):[0 TO 5]*

I can't calculate this at index time.

Bests Sandro





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-function-return-value-for-range-queries-tp4093499.html
Sent from the Solr - User mailing list archive at Nabble.com. 



AW: Solr grouping performace

2013-10-03 Thread Sandro Zbinden
Hey Alok

I don't think that the group performance is bad on integer fields. But try to 
load all results in one query is very memory consuming.

Did you try to only load the first 1000 results with rows=1000 and start=0 ?

To get the total  group count you can use the group.ngroups=true param. 

Bests

Sandro Zbinden

-Ursprüngliche Nachricht-
Von: Alok Bhandari [mailto:alokomprakashbhand...@gmail.com] 
Gesendet: Donnerstag, 3. Oktober 2013 12:31
An: solr-user@lucene.apache.org
Betreff: Solr grouping performace

Hello ,
I am using solr 4.0 , I want to group entries on one of the int field , I need 
all the groups and group.limit is 1. I am getting very slow performance and 
some times I am getting OutOfMemory also. 

My index is having 20 million records and out of which my search result returns 
1 million document and I do grouping on these 1 million docs. The side of data 
on disk is approx 2GB and I am having Xmx 2GB. Please can anyone help me out in 
this. performance is very slow it takes 10-12 seconds.

Thanks ,

Alok



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-performace-tp4093300.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: Exact Date Search

2013-10-03 Thread Sandro Zbinden
Hey Soumik

Did you read the http://wiki.apache.org/solr/SolrQuerySyntax page. It has some 
examples with dates.

It is important that you index your field as a solr date field fieldType 
name=date class=solr.DateField/


1. Exact Matching: q= modify_date: 2012-07-06T9:23:43Z 
2. Less than: q= modify_date:{* TO 2012-07-06T9:23:43Z }
3. More than: q= modify_date:{ 2012-07-06T9:23:43Z TO *}

4. Less or equal than: modify_date:[* TO 2012-07-06T9:23:43Z]
5. More or equal than: modify_date:[ 2012-07-06T9:23:43Z TO *]

Bests Sandro Zbinden


-Ursprüngliche Nachricht-
Von: soumikghosh05 [mailto:soumikghos...@yahoo.co.in] 
Gesendet: Donnerstag, 3. Oktober 2013 10:19
An: solr-user@lucene.apache.org
Betreff: Exact Date Search

I have a date filed modify_date and the field value is 2012-08-09T11:23:43Z , 
2011-09-02T12:23:43Z and 2012-07-06T9:23:43Z for 3 docs.

User provided a date in the search form for an example  2012-07-06T9:23:43Z.

1. I wanted to get the doc who's modify date is matching with the user supplied 
date. 
2. I wanted to get the doc who's modify date is less than the user supplied 
date.  
3. I wanted to get the doc who's modify date is greater than the user supplied 
date. 

I am using solr.DateFiled as the field type.

I am very new to solr and I need the query syntax for the above 3 queries.

Any help would be great help for me.

Thanks,
Soumik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Date-Search-tp4093273.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: AW: Solr grouping performace

2013-10-03 Thread Sandro Zbinden
About the performance I can't say too much but what I would recommend to do is 
that you loop over the results.

The first query gives you the number of groups  (ngroups)

q=*:*groups=truegroup.ngroups=truegroup.field=myfieldstart=0rows=1

And after that you execute the other queries in a simply change the start param

q=*:*groups=truegroup.field=myfieldstart=1rows=1
q=*:*groups=truegroup.field=myfieldstart=2rows=1
q=*:*groups=truegroup.field=myfieldstart=3rows=1

and so on...

I think like this you can avoid the solr out of memory exception.
PS: But be carefull if you store all the rows on client side you need memory 
too :-)

Bests Sandro


-Ursprüngliche Nachricht-
Von: Alok Bhandari [mailto:alokomprakashbhand...@gmail.com] 
Gesendet: Donnerstag, 3. Oktober 2013 13:02
An: solr-user@lucene.apache.org
Betreff: Re: AW: Solr grouping performace

Thanks for reply Sandro.

My requirement is that I need all groups and then build compact data from it to 
send to server. I am not sure about how much RAM should be allocated to JVM 
instance to make it serve requests faster , any inputs on that are welcome.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-performace-tp4093300p4093311.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: Facet Sort with non ASCII Characters

2013-09-10 Thread Sandro Zbinden
Hey Yonik

It installed the latest Solr (Solr 4.4) and started the jetty configured in the 
example directory. 

To the core collection1 I added three titles. a, b, ä

curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' 
-d '[{id : 1, title : a},{id : 2, title : ä},{id : 3, 
title : b}]'

Now I want to sort these three titles with the following query: 

http://localhost:8983/solr/collection1/select?q=*:*facet=truefacet.sort=indexfacet.field=titlerows=0

I expect:

lst name=title
int name=a1/int
int name=ä1/int
int name=b1/int
/lst

But I receive 

lst name=title
int name=a1/int
int name=b1/int
int name=ä1/int
/lst

PS: In Java I would sort these value with a Comperator that uses 
Collator.getInstance().compare(value1, value2);

Best regards 

Sandro

-Ursprüngliche Nachricht-
Von: ysee...@gmail.com [mailto:ysee...@gmail.com] Im Auftrag von Yonik Seeley
Gesendet: Montag, 9. September 2013 21:26
An: solr-user@lucene.apache.org
Betreff: Re: Facet Sort with non ASCII Characters

On Mon, Sep 9, 2013 at 7:16 AM, Sandro Zbinden zbin...@imagic.ch wrote:
 Is there a plan to add support for alphabetical facet sorting with non ASCII 
 Characters ?

The entire unicode range should already work.  Can you give an example of what 
you would like to see?

-Yonik
http://lucidworks.com


AW: Facet sort descending

2013-09-10 Thread Sandro Zbinden
Hi

@Peter This is actually the requirement. We have. For both sort options (index, 
count) we would like to have the possibility to add the desc option.

Instead of this result  
q=*:*facet=truefacet.field=image_textfacet.sort=indexrows=0

lst name=facet_fields
  lst name=image_text
 int name=a12/int
int name=b23/int
int name=c200/int
 /lst
/lst

We would like to add desc to the sort option like facet.sort=index,desc  to get 
the following result

lst name=facet_fields
  lst name=image_text
 int name=c200/int
int name=b23/int
int name=a12/int
 /lst
/lst

Bests Sandro

 
-Ursprüngliche Nachricht-
Von: Peter Sturge [mailto:peter.stu...@gmail.com] 
Gesendet: Dienstag, 10. September 2013 11:17
An: solr-user@lucene.apache.org
Betreff: Re: Facet sort descending

Hi,

This question could possibly be about rare idr facet counting - i.e. retrun the 
facets counts with the least values.
I remember doing a patch for this years ago, but then it broke when some 
UninvertedField facet optimization came in around ~3.5 time.
It's a neat idea though to have an option to show the 'rarest N' facets not 
just the 'top N'.

Thanks,
Peter



On Mon, Sep 9, 2013 at 11:43 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Is there a plan to add a descending sort order for facet queries ?
 : Best regards Sandro

 I don't understand your question.

 if you specify multiple facet.query params, then the constraint counts 
 are returned in the order they were initially specified -- there is no 
 need for server side sorting, because they all come back (as opposed 
 to facet.field where the number of constraints can be unbounded and 
 you may request just the top X using facet.limit)

 If you are asking about facet.field and using facet.sort to specify 
 the order of the constraints for each field, then no -- i don't 
 believe anyone is currently working on adding options for descending sort.

 I don't think it would be hard to add if someone wanted to ... I just 
 don't know that there has ever been enough demand for anyone to look 
 into it.


 -Hoss



AW: Facet sort descending

2013-09-10 Thread Sandro Zbinden
Hey Peter

To sort these on the client side is no problem. But we have a problem using 
pivot facet queries. If we set the facet.limit=-1 the load can cause 
OutOfMemoryExceptions on the server side.
Thanks again for the patch.  I will keep an eye on it.

Sandro 

-Ursprüngliche Nachricht-
Von: Peter Sturge [mailto:peter.stu...@gmail.com] 
Gesendet: Dienstag, 10. September 2013 13:39
An: solr-user@lucene.apache.org
Betreff: Re: Facet sort descending

Hi Sandro,
Ah, ok, this is quite simple then - you should be able to sort these any way 
you like in your client code since the facet data is all there.
On the server-side, you can look at
https://issues.apache.org/jira/browse/SOLR-1672 - please note this is an old 
patch for 1.4, so this won't work on 4.x - but it can give an idea of how/where 
to do the sorting on the server-side, if you want to go down that road.
HTH
Peter



On Tue, Sep 10, 2013 at 11:49 AM, Sandro Zbinden zbin...@imagic.ch wrote:

 Hi

 @Peter This is actually the requirement. We have. For both sort 
 options (index, count) we would like to have the possibility to add the desc 
 option.

 Instead of this result
  q=*:*facet=truefacet.field=image_textfacet.sort=indexrows=0

 lst name=facet_fields
   lst name=image_text
  int name=a12/int
 int name=b23/int
 int name=c200/int
  /lst
 /lst

 We would like to add desc to the sort option like 
 facet.sort=index,desc  to get the following result

 lst name=facet_fields
   lst name=image_text
  int name=c200/int
 int name=b23/int
 int name=a12/int
  /lst
 /lst

 Bests Sandro


 -Ursprüngliche Nachricht-
 Von: Peter Sturge [mailto:peter.stu...@gmail.com]
 Gesendet: Dienstag, 10. September 2013 11:17
 An: solr-user@lucene.apache.org
 Betreff: Re: Facet sort descending

 Hi,

 This question could possibly be about rare idr facet counting - i.e.
 retrun the facets counts with the least values.
 I remember doing a patch for this years ago, but then it broke when 
 some UninvertedField facet optimization came in around ~3.5 time.
 It's a neat idea though to have an option to show the 'rarest N' 
 facets not just the 'top N'.

 Thanks,
 Peter



 On Mon, Sep 9, 2013 at 11:43 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:

 
  : Is there a plan to add a descending sort order for facet queries ?
  : Best regards Sandro
 
  I don't understand your question.
 
  if you specify multiple facet.query params, then the constraint 
  counts are returned in the order they were initially specified -- 
  there is no need for server side sorting, because they all come back 
  (as opposed to facet.field where the number of constraints can be 
  unbounded and you may request just the top X using facet.limit)
 
  If you are asking about facet.field and using facet.sort to specify 
  the order of the constraints for each field, then no -- i don't 
  believe anyone is currently working on adding options for descending
 sort.
 
  I don't think it would be hard to add if someone wanted to ... I 
  just don't know that there has ever been enough demand for anyone to 
  look into it.
 
 
  -Hoss
 



Facet Sort with non ASCII Characters

2013-09-09 Thread Sandro Zbinden
Dear solr users

Is there a plan to add support for alphabetical facet sorting with non ASCII 
Characters ?

Best regards Sandro



Sandro Zbinden
Software Engineer





Facet sort descending

2013-09-09 Thread Sandro Zbinden
Dear solr users

Is there a plan to add a descending sort order for facet queries ?
Best regards Sandro


Sandro Zbinden
Software Engineer





Transaction log on-disk guarantees

2013-08-27 Thread Sandro Zbinden
Dear solr users

We are using the solr soft comit feature and we are worried about what happens 
after we restart the solr server.

Can we activate the transaction log to have on disk guarantees and then use the 
solr soft commit feature ?

Thanks and Best regards

Sandro Zbinden


[imagic]Sandro Zbinden
Software Engineer




AW: Transaction log on-disk guarantees

2013-08-27 Thread Sandro Zbinden
Hey Mark 

Thank you very much for the quick answer. We have a single node environment.

I try to find the fsync option but was not successful. Ended up in the 
UpdateLog class :-)

How do I enable fsync in the solrconfig.xml ?


Besides that:

If solr soft commit feature has a on disk guarantee with a transaction log why 
we don't use solr soft commit as the default commit option ?


-Ursprüngliche Nachricht-
Von: Mark Miller [mailto:markrmil...@gmail.com] 
Gesendet: Dienstag, 27. August 2013 17:12
An: solr-user@lucene.apache.org
Betreff: Re: Transaction log on-disk guarantees


On Aug 27, 2013, at 11:08 AM, Sandro Zbinden zbin...@imagic.ch wrote:

 Can we activate the transaction log to have on disk guarantees and then use 
 the solr soft commit feature ?

Yes you can. If you only have a single node (no replication), you probably want 
to turn on fsync via the config.

- Mark



AW: Transaction log on-disk guarantees

2013-08-27 Thread Sandro Zbinden
@Mark Do you know how I can set the syncLevel to fsync in the solrconfig.xml  I 
can't find in the default solrconfig.xml 

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml

The blog posts at 
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
 says that enabling the fsync is not a big increase in the update time (a few 
milliseconds (say 10-50 ms)). So I think It is useful to turn on fsync. 

On Aug 27, 2013, at 11:54 AM, Erick Erickson erickerick...@gmail.com wrote:

 Soft commits flush to the op system, so a JVM crash/termination 
 shouldn't affect it anyway.

A soft commit is not a hard commit, so there are not guarantees like this. It 
searches committed and non committed segments - non committed segments will not 
magically be committed after a JVM crash.

 Turning on the fsync
 would just be a little bit of extra protection..

If you don't have replication, it turns on strong 'durability' promises. 
Without it, you are on your own if you have a hard machine reset. If durability 
is important to you and you don't have replication, it's important to use the 
fync option here. Unless you have a great, long time, battery backup and/or an 
env such that hard resets don't concern you for some reason. It comes down to 
your requirments.

Responses to Sandro inline below:

On Aug 27, 2013, at 11:43 AM, Sandro Zbinden zbin...@imagic.ch wrote:

 Hey Mark
 
 Thank you very much for the quick answer. We have a single node environment.
 
 I try to find the fsync option but was not successful. Ended up in the 
 UpdateLog class :-)
 
 How do I enable fsync in the solrconfig.xml ?

In the updateLog config, its a syncLevel=fsync param.

 
 
 Besides that:
 
 If solr soft commit feature has a on disk guarantee with a transaction log 
 why we don't use solr soft commit as the default commit option ?

Yes, for visibility you should use soft commit. You should also have an auto 
hard commit with openSearcher=false - it's just about flushing the transaction 
log and freeing memory in this configuraiton - which is why it makes sense to 
simply turn on the auto commit for regular hard commits. You may or may not 
want to use auto soft commits.

- Mark

 
 
 -Ursprüngliche Nachricht-
 Von: Mark Miller [mailto:markrmil...@gmail.com]
 Gesendet: Dienstag, 27. August 2013 17:12
 An: solr-user@lucene.apache.org
 Betreff: Re: Transaction log on-disk guarantees
 
 
 On Aug 27, 2013, at 11:08 AM, Sandro Zbinden zbin...@imagic.ch wrote:
 
 Can we activate the transaction log to have on disk guarantees and then use 
 the solr soft commit feature ?
 
 Yes you can. If you only have a single node (no replication), you probably 
 want to turn on fsync via the config.
 
 - Mark
 



Solr show total row count in response of full import

2013-07-31 Thread Sandro Zbinden
Hey there

Is there a way to show the total row count (documents that will be inserted) 
when executing a full import through the Data Import Request handler ?

Currently after executing a full import and pointing to solrcore/dataimport 
you can get  the total rows processed

str name=Total Documents Processed6354/str

It would be nice if you could receive a total row count like

str name=Total Documents10100/str

With this information we could add another information like

str name=Imported in Percent 62.91/str

This would make it easier to generate a progress bar for the end user.


Best regards

Sandro Zbinden



AW: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot faceting requests with facet.pivot.ngroup=true and facet.pivot.showLastList=false

2013-07-26 Thread Sandro Zbinden
Hey Erick

Thank you very much for your help.

So I dived into the solr code and read the  
http://wiki.apache.org/solr/HowToContribute section. Really informative :-)

I created a Jira issue about my problem and I attached a patch file with a 
implementation off pivot faceting with ngroup and visible 

Here is the link to the Jira Task 

https://issues.apache.org/jira/browse/SOLR-5079

Best Regards Sandro


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Sonntag, 21. Juli 2013 14:59
An: solr-user@lucene.apache.org
Betreff: Re: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot 
faceting requests with facet.pivot.ngroup=true and 
facet.pivot.showLastList=false

Sorry, life's been really hectic lately. I don't know the pivot code, so can't 
make much of a comment on that. But when it comes to code changes, it's 
perfectly reasonable to open up a JIRA and attach the code as a patch. You 
might have to nudge people a bit to get them to carry it forward...

The case will be strengthened if you can say that all the tests pass with your 
patch. If the tests don't pass, then it may point to issues with your patch, 
take a quick look at the tests that fail and see if they're related to your 
changes.

Start here:
http://wiki.apache.org/solr/HowToContribute

Best
Erick

On Fri, Jul 19, 2013 at 9:25 AM, Sandro Zbinden zbin...@imagic.ch wrote:
 Dear Members.

 Do you guys think I am better off in the solr developer group with this 
 question.

 To summarize I would like to add a facet.pivot.ngroup =true param for 
 show the count of the facet list Further on I would like to avoid an out of 
 memory exceptions in reducing the result of a facet.pivot query.

 Best Regards

 Sandro Zbinden


 -Ursprüngliche Nachricht-
 Von: Sandro Zbinden [mailto:zbin...@imagic.ch]
 Gesendet: Mittwoch, 17. Juli 2013 13:45
 An: solr-user@lucene.apache.org
 Betreff: Avoid Solr Pivot Faceting Out of Memory / Shorter result for 
 pivot faceting requests with facet.pivot.ngroup=true and 
 facet.pivot.showLastList=false

 Dear Usergroup


 I am getting an out of memory exception in the following scenario.
 I have 4 sql tables: patient, visit, study and image that will be 
 denormalized for the solr index The solr index looks like the 
 following


 
 |p_id |p_lastname|v_id  |v_name  |...
 
 | 1  | Miller| 10 | Study 1   |...
 | 2  | Miller| 11 | Study 2   |...
 | 2  | Miller| 12 | Study 3   |...  -- Duplication 
 because of denormalization
 | 3  | Smith| 13 | Study 4  |...
 --

 Now I am executing a facet query

 q=*:*facet=true facet.pivot=p_lastname,p_id facet.limit=-1

 And I get the following result

 lst
 str name=fieldp_lastname/str
 str name=valueMiller/str
 int name=count3/int
 arr name=pivot
   lst
str name=fieldp_id/str
int name=value1/int
int name=count1/int
   /lst
   lst
str name=fieldp_id/str
int name=value2/int
int name=count2/int
   /lst
 /arr
 /lst
 lst
 str name=fieldp_lastname/str
 str name=valueSmith/str
 int name=count1/int
 arr name=pivot
str name=fieldp_id/str
int name=value3/int
int name=count1/int
   /lst
 /arr
 /lst


 The goal is to show our clients a list of the group value and in parentheses 
 how many patients the group contains.
  - Miller (2)
 - Smith (1)

 This is why we need to use the facet.pivot method with facet.limit-1. It is 
 as far as I know the only way to get a grouping for 2 criterias.
 And we need the pivot list to count how many patients are in a group.


 Currently this works good on smaller indexes but if we have arround 1'000'000 
 patients and we execute a query like the one above we run in an out of memory.
 I figured out that the problem is not the calculation of the pivot but is the 
 presentation of the result.
 Because we load all fields (we can not us facet.offset because we need to 
 order the results ascending and descending) the result can get really big.

 To avoid this overload I created a change in the solr-core 
 PivotFacetHandler.java class.
 In the method doPivots i added the following code

NamedListInteger nl = this.getTermCounts(subField);
pivot.add( ngroups, nl.size());

 This will give me the group size of the list.
 Then I removed the recursion call pivot.add( pivot, doPivots( nl, 
 subField, nextField, fnames, subset) ); Like this my result looks like 
 the following

 lst
 str name=fieldp_lastname/str
 str name=valueMiller/str
 int name=count3/int
 int name=ngroup2/int
 /lst
 lst
 str name=fieldp_lastname/str
 str name=valueSmith/str
 int name=count1/int
 int name=ngroup1/int
 /lst


 My questions is now if there is already something planned like 
 facet.pivot.ngroup=true and facet.pivot.showLastList=false to improve the 
 performance of pivot faceting

AW: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot faceting requests with facet.pivot.ngroup=true and facet.pivot.showLastList=false

2013-07-19 Thread Sandro Zbinden
Dear Members.

Do you guys think I am better off in the solr developer group with this 
question. 

To summarize I would like to add a facet.pivot.ngroup =true param for show the 
count of the facet list
Further on I would like to avoid an out of memory exceptions in reducing the 
result of a facet.pivot query.

Best Regards 

Sandro Zbinden


-Ursprüngliche Nachricht-
Von: Sandro Zbinden [mailto:zbin...@imagic.ch] 
Gesendet: Mittwoch, 17. Juli 2013 13:45
An: solr-user@lucene.apache.org
Betreff: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot 
faceting requests with facet.pivot.ngroup=true and 
facet.pivot.showLastList=false

Dear Usergroup


I am getting an out of memory exception in the following scenario.
I have 4 sql tables: patient, visit, study and image that will be denormalized 
for the solr index The solr index looks like the following



|p_id |p_lastname|v_id  |v_name  |...

| 1  | Miller| 10 | Study 1   |...
| 2  | Miller| 11 | Study 2   |...
| 2  | Miller| 12 | Study 3   |...  -- Duplication because 
of denormalization
| 3  | Smith| 13 | Study 4  |...
--

Now I am executing a facet query

q=*:*facet=true facet.pivot=p_lastname,p_id facet.limit=-1

And I get the following result

lst
str name=fieldp_lastname/str
str name=valueMiller/str
int name=count3/int
arr name=pivot
  lst
   str name=fieldp_id/str
   int name=value1/int
   int name=count1/int
  /lst
  lst
   str name=fieldp_id/str
   int name=value2/int
   int name=count2/int
  /lst
/arr
/lst
lst
str name=fieldp_lastname/str
str name=valueSmith/str
int name=count1/int
arr name=pivot
   str name=fieldp_id/str
   int name=value3/int
   int name=count1/int
  /lst
/arr
/lst


The goal is to show our clients a list of the group value and in parentheses 
how many patients the group contains.
 - Miller (2)
- Smith (1)

This is why we need to use the facet.pivot method with facet.limit-1. It is as 
far as I know the only way to get a grouping for 2 criterias.
And we need the pivot list to count how many patients are in a group.


Currently this works good on smaller indexes but if we have arround 1'000'000 
patients and we execute a query like the one above we run in an out of memory.
I figured out that the problem is not the calculation of the pivot but is the 
presentation of the result.
Because we load all fields (we can not us facet.offset because we need to order 
the results ascending and descending) the result can get really big.

To avoid this overload I created a change in the solr-core 
PivotFacetHandler.java class.
In the method doPivots i added the following code

   NamedListInteger nl = this.getTermCounts(subField);
   pivot.add( ngroups, nl.size());

This will give me the group size of the list.
Then I removed the recursion call pivot.add( pivot, doPivots( nl, subField, 
nextField, fnames, subset) ); Like this my result looks like the following

lst
str name=fieldp_lastname/str
str name=valueMiller/str
int name=count3/int
int name=ngroup2/int
/lst
lst
str name=fieldp_lastname/str
str name=valueSmith/str
int name=count1/int
int name=ngroup1/int
/lst


My questions is now if there is already something planned like 
facet.pivot.ngroup=true and facet.pivot.showLastList=false to improve the 
performance of pivot faceting.

Is there a chance we could get this into the solr code. I think it's a really 
small change of the code but could improve the product enormous.

Best Regards

Sandro Zbinden



Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot faceting requests with facet.pivot.ngroup=true and facet.pivot.showLastList=false

2013-07-17 Thread Sandro Zbinden
Dear Usergroup


I am getting an out of memory exception in the following scenario.
I have 4 sql tables: patient, visit, study and image that will be denormalized 
for the solr index
The solr index looks like the following



|p_id |p_lastname|v_id  |v_name  |...

| 1  | Miller| 10 | Study 1   |...
| 2  | Miller| 11 | Study 2   |...
| 2  | Miller| 12 | Study 3   |...  -- Duplication because 
of denormalization
| 3  | Smith| 13 | Study 4  |...
--

Now I am executing a facet query

q=*:*facet=true facet.pivot=p_lastname,p_id facet.limit=-1

And I get the following result

lst
str name=fieldp_lastname/str
str name=valueMiller/str
int name=count3/int
arr name=pivot
  lst
   str name=fieldp_id/str
   int name=value1/int
   int name=count1/int
  /lst
  lst
   str name=fieldp_id/str
   int name=value2/int
   int name=count2/int
  /lst
/arr
/lst
lst
str name=fieldp_lastname/str
str name=valueSmith/str
int name=count1/int
arr name=pivot
   str name=fieldp_id/str
   int name=value3/int
   int name=count1/int
  /lst
/arr
/lst


The goal is to show our clients a list of the group value and in parentheses 
how many patients the group contains.
 - Miller (2)
- Smith (1)

This is why we need to use the facet.pivot method with facet.limit-1. It is as 
far as I know the only way to get a grouping for 2 criterias.
And we need the pivot list to count how many patients are in a group.


Currently this works good on smaller indexes but if we have arround 1'000'000 
patients and we execute a query like the one above we run in an out of memory.
I figured out that the problem is not the calculation of the pivot but is the 
presentation of the result.
Because we load all fields (we can not us facet.offset because we need to order 
the results ascending and descending) the result can get really big.

To avoid this overload I created a change in the solr-core 
PivotFacetHandler.java class.
In the method doPivots i added the following code

   NamedListInteger nl = this.getTermCounts(subField);
   pivot.add( ngroups, nl.size());

This will give me the group size of the list.
Then I removed the recursion call pivot.add( pivot, doPivots( nl, subField, 
nextField, fnames, subset) );
Like this my result looks like the following

lst
str name=fieldp_lastname/str
str name=valueMiller/str
int name=count3/int
int name=ngroup2/int
/lst
lst
str name=fieldp_lastname/str
str name=valueSmith/str
int name=count1/int
int name=ngroup1/int
/lst


My questions is now if there is already something planned like 
facet.pivot.ngroup=true and facet.pivot.showLastList=false to improve the 
performance
of pivot faceting.

Is there a chance we could get this into the solr code. I think it's a really 
small change of the code but could improve the product enormous.

Best Regards

Sandro Zbinden