Re: Exception using distributed field-collapsing
Hi Bryan, What is the fieldtype of the groupField? You can only group by field that is of type string as is described in the wiki: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters When you group by another field type a http 400 should be returned instead if this error. At least that what I'd expect. Martijn On 20 June 2012 20:37, Bryan Loofbourrow bloofbour...@knowledgemosaic.com wrote: I am doing a search on three shards with identical schemas (I double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is giving me back the exception listed at the bottom of this email: Other information: My schema uses the following field types: StrField, DateField, TrieDateField, TextField, SortableInt, SortableLong, BoolField My query looks like this (I’ve messed with it to anonymize but, I hope, kept the essentials: http://[solr core2] /select/?start=0rows=25q={!qsol}machinessort=[sort field] fl=[list of fields] shards=[solr core1]%2c[solr core2]%2c[solr core3]group=truegroup.field=[group field] java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.String at org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844) at org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180) at org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154) at java.util.TreeMap.put(TreeMap.java:547) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222) at org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285) at org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340) at org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77) at org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) Any thoughts or advice? Thanks, -- Bryan -- Met vriendelijke groet, Martijn van Groningen
Re: Issue with field collapsing in solr 4 while performing distributed search
The ngroups returns the number of groups that have matched with the query. However if you want ngroups to be correct in a distributed environment you need to put document belonging to the same group into the same shard. Groups can't cross shard boundaries. I guess you need to do some manual document partitioning. Martijn On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote: Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with ngroups set as true as we need the number of search results. However, there is a difference in the number of result list returned and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true The response XMl looks like response script/ lst name=responseHeader int name=status0/int int name=QTime46/int lst name=params str name=group.fieldid/str str name=group.ngroupstrue/str str name=grouptrue/str str name=qmessagebody:monit AND usergroupid:3/str /lst /lst lst name=grouped lst name=id int name=matches10/int int name=ngroups9/int arr name=groups lst str name=groupValue320043/str result name=doclist numFound=1 start=0 doc.../doc /result /lst lst str name=groupValue398807/str result name=doclist numFound=5 start=0 maxScore=2.4154348... /result /lst lst str name=groupValue346878/str result name=doclist numFound=2 start=0.../result /lst lst str name=groupValue346880/str result name=doclist numFound=2 start=0.../result /lst /arr /lst /lst /response So you can see that the ngroups value returned is 9 and the actual number of groups returned is 4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Is this an open issue ? Any kind of help is appreciated. -- Regards, Nitesh Nandy -- Met vriendelijke groet, Martijn van Groningen
Re: Grouping ngroups count
Hi Francois, The issue you describe looks like a similar issue we have fixed before with matches count. Open an issue and we can look into it. Martijn On 1 May 2012 20:14, Francois Perron francois.per...@wantedanalytics.com wrote: Thanks for your response Cody, First, I used distributed grouping on 2 shards and I'm sure then all documents of each group are in the same shard. I take a look on JIRA issue and it seem really similar. There is the same problem with group.ngroups. The count is calculated in second pass so we only had result from useful shards and it's why when I increase rows limit i got the right count (they must use all my shards). Except it's a feature (i hope not), I will create a new JIRA issue for this. Thanks On 2012-05-01, at 12:32 PM, Young, Cody wrote: Hello, When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed query? If you're doing a distributed query, then for group.ngroups to work you need to ensure that all documents for a group exist on a single shard. However, what you're describing sounds an awful lot like this JIRA issue that I entered a while ago for distributed grouping. I found that the hit count was coming only from the shards that ended up having results in the documents that were returned. I didn't test group.ngroups at the time. https://issues.apache.org/jira/browse/SOLR-3316 If this is a similar issue then you should make a new Jira issue. Cody -Original Message- From: Francois Perron [mailto:francois.per...@wantedanalytics.com] Sent: Tuesday, May 01, 2012 6:47 AM To: solr-user@lucene.apache.org Subject: Grouping ngroups count Hello all, I tried to use grouping with 2 slices with a index of 35K documents. When I ask top 10 rows, grouped by filed A, it gave me about 16K groups. But, if I ask for top 20K rows, the ngroups property is now at 30K. Do you know why and of course how to fix it ? Thanks. -- Met vriendelijke groet, Martijn van Groningen
Re: Solr Cloud vs sharding vs grouping
Hi Jean-Sebastien, For some grouping features (like total group count and grouped faceting), the distributed grouping requires you to partition your documents into the right shard. Basically groups can't cross shards. Otherwise the group counts or grouped facet counts may not be correct. If you use the basic grouping functionality then this limitation doesn't apply. I think right now that SolrCloud partitions documents based on the unique id (id % number_shards). You need to modify this somehow or maybe do the distributed indexing yourself. Martijn On 20 April 2012 12:07, Lance Norskog goks...@gmail.com wrote: The implementation of grouping in the trunk is completely different from 236. Grouping works across distributed search: https://issues.apache.org/jira/browse/SOLR-2066 committed last September. On Thu, Apr 19, 2012 at 6:04 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I am currently trying out SolrCloud on a small cluster and I'm enjoying Solr more than ever. Thanks to all the contributors. That being said, one very important feature for us is the grouping/collapsing of results on a specific field value on a distributed index. We are currently using Solr 1.4 with Patch 236 and it does the job as long as all documents with a common field value are on the same shard. Otherwise grouping on a distributed index will not work as expected. I looked everywhere if this limitation was still present in the trunk but found no mention of it. Is this still a requirement for grouping results on a distributed index? Thanks -- Lance Norskog goks...@gmail.com -- Met vriendelijke groet, Martijn van Groningen
Re: To truncate or not to truncate (group.truncate vs. facet)
The group.facet option only works for field facets (facet.field). Others facets types (query, range and pivot) aren't supported yet. The group.facet works for both single and multivalued fields specified in the facet.field parameter. Martijn On 9 April 2012 20:58, danjfoley d...@micamedia.com wrote: I am using group.facet and it works fine for regular facet.field but not for facet.query Sent from my phone - Reply message - From: Young, Cody [via Lucene] ml-node+s472066n3897487...@n3.nabble.com Date: Mon, Apr 9, 2012 1:38 pm Subject: To truncate or not to truncate (group.truncate vs. facet) To: danjfoley d...@micamedia.com One other thing, I believe that you need to be using facet.field on single valued string fields for group.facet to function properly. Are the fields you're faceting on multiValued=false? Cody -Original Message- From: Young, Cody [mailto:cody.yo...@move.com] Sent: Monday, April 09, 2012 10:36 AM To: solr-user@lucene.apache.org Subject: RE: To truncate or not to truncate (group.truncate vs. facet) You tried adding the parameter group.facet=true ? Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Monday, April 09, 2012 10:09 AM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I did get this working with version 4. However my facet queries still don't group. Sent from my phone - Reply message - From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com Date: Mon, Apr 9, 2012 12:45 pm Subject: To truncate or not to truncate (group.truncate vs. facet) To: danjfoley d...@micamedia.com I believe you're looking for what's called, Matrix Counts Please see this JIRA issue. To my knowledge it has been committed in trunk but not 3.x. https://issues.apache.org/jira/browse/SOLR-2898 This feature is accessed by using group.facet=true Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Saturday, April 07, 2012 7:02 PM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I've been searching for a solution to my issue, and this seems to come closest to it. But not exactly. I am indexing clothing. Each article of clothing comes in many sizes and colors, and can belong to any number of categories. For example take the following: I add 6 documents to solr as follows: product, color, size, category shirt A, red, small, valentines day shirt A, red, large, valentines day shirt A, blue, small, valentines day shirt A, blue, large, valentines day shirt A, green, small, valentines day shirt A, green, large, valentines day I'd like my facet counts to return as follows: color red (1) blue (1) green (1) size small (1) large (1) category valentines day (1) But they come back like this: color: red (2) blue (2) green (2) size: small (2) large (2) category valentines day (6) I see the group.facet parameter in version 4.0 does exactly this. However how can I make this happen now? There are all sorts of ecommerce systems out there that facet exactly how i'm asking. i thought solr is supposed to be the very best fastest search system, yet it doesn't seem to be able to facet correct for items with multiple values? Am i indexing my data wrong? how can i make this happen? -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg== -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897487.html To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg== -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897694.html
Re: Distributed grouping issue
The matches element in the response should return the number of documents that matched with the query and not the number of groups. Did you encountered this issue also with other Solr versions (3.5 or another nightly build)? Martijn On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote: Hi, when you write I get xxx results, does it come from 'numFound' ? Or you really display xxx results ? When using both field collapsing and sharding, the 'numFound' may be wrong. In that case, think about using 'shards.rows' parameter with a high value (be careful, it's bad for performance). If the problem is really about the returned results, it may be because of several documents having the same unique key document_id in different shards. Hope it helps, Franck Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit : I forgot to mention, I can see the distributed requests happening in the logs: Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core3] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=0 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core7] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=3 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core5] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0
Re: Distributed grouping issue
All documents of a group exist on a single shard, there are no cross-shard groups. You only have to partition documents by group when the groupCount and some other features need to be accurate. For the matches this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn
Re: Distributed grouping issue
I tried the to reproduce this. However the matches always returns 4 in my case (when using rows=1 and rows=2). In your case the 2 documents on each core do belong to the same group, right? I did find something else. If I use rows=0 then an error occurs. I think we need to further investigate this. Can you open an issue in Jira? I'm a bit busy today. We can then further look into this in the coming days. Martijn On 2 April 2012 23:00, Young, Cody cody.yo...@move.com wrote: Okay, I've played with this a bit more. Found something interesting: When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core) Example: http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1 lst name=grouped lst name=group_field int name=matches2/int Then, just by changing rows=2 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2 lst name=grouped lst name=group_field int name=matches4/int Let me know if you have any luck reproducing. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 1:48 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue All documents of a group exist on a single shard, there are no cross-shard groups. You only have to partition documents by group when the groupCount and some other features need to be accurate. For the matches this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn -- Met vriendelijke groet, Martijn van Groningen
Re: Grouping queries
On 22 March 2012 03:10, Jamie Johnson jej2...@gmail.com wrote: I need to apologize I believe that in my example I have too grossly over simplified the problem and it's not clear what I am trying to do, so I'll try again. I have a situation where I have a set of access controls say user, super user and ultra user. These controls are not necessarily hierarchical in that user super user ultra user. Each of these controls should only be able to see documents from with some combination of access controls they have. In my actual case we have many access controls and they can be combined in a number of fashions so I can't simply constrain what they are searching by a query alone (i.e. if it's a user the query is auth:user AND (some query)). Now I have a case where a document contains information that a user can see but also contains information a super user can see. Our current system marks this document at the super user level and the user can't see it. We now have a requirement to make the pieces that are at the user level available to the user while still allowing the super user to see and search all the information. My original thought was to simply index the document twice, this would end up in a possible duplicate (say if a user had both user and super user) but since this situation is rare it may not matter. After coming across the grouping capability in solr I figured I could execute a group query where we grouped on some key which indicated that 2 documents were the same just with different access controls (user and super user in this example). We could then filter out the documents in the group the user isn't allowed to see and only keep the document with the access controls they have. Maybe I'm not understanding this right... But why can't you save the access controls as a multivalued field in your schema? In your example your can then if the current user is a normal user just query auth:user AND (query) and if the current user is a super user auth:superuser AND (query). A document that is searchable for both superuser and user is then returned (if it matches the rest of the query). I hope this makes more sense, unfortunately the Join queries I don't believe will work because I don't think if I create documents which would be relevant to each access control I could search across these document as if it was a single document (i.e. search for something in the user document and something in the super user document in a single query). This lead me to believe that grouping was the way to go in this case, but again I am very interested in any suggestions that the community could offer. I wouldn't use grouping. The Solr join is still a option. Lets say you have many access controls and the access controls change often on your documents. You can then choose to store the access controls with an id to your logic document as a separate document in a different Solr Core (index). In the core were your main documents are you don't keep the access controls. You can then use the solr join to filter out documents that the current user isn't supposed to search. Something like this: q=(query)fq={!join fromIndex=core1 from=doc_id to=id}auth:superuser Core 1 is the core containing the access control documents and the doc_id is the id that points to your regular documents. The benefit of this approach is that if you fine tune the core1 for high updatability you can change you access controls very frequently without paying a big performance penalty.
Re: Grouping queries
Where is Join documented? I looked at http://wiki.apache.org/solr/Join and see no reference to fromIndex. Also does this work in a distributed environment? The fromIndex isn't documented in the wiki It is mentioned in the issue and you can find in the Solr code: https://issues.apache.org/jira/browse/SOLR-2272?focusedCommentId=13024918page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13024918 The Solr join only works in a distributed environment if you partition your documents properly. Documents that link to each other need to reside on the same shard and this can be a problem in some cases. Martijn
Re: Grouping queries
I'm not sure if grouping is the right feature to use for your requirements... Grouping does have an impact on performance which you need to take into account. Depending on what grouping features you're going to use (grouped facets, ngroups), grouping performs well on large indices if you use filters queries well. (E.g. 100M travel offers, but during searching only interrested in in travel offers to specific destinations or in a specific period of time). Best way to find this out, is to just try it out. On 21 March 2012 22:34, Jamie Johnson jej2...@gmail.com wrote: I was wondering how much more intensive grouping queries are in general. I am considering using grouping queries as my primary query because I have the need to store a document as pieces with varying access controls, for instance a portion of a document a user can see but an admin can see the entire thing (I'm greatly simplifying this). My thought was to do a grouping request and group on a field which contained a key which the documents all shared, but I am worried about how well this will perform at scale. Any thoughts/suggestions on this would be appreciated. -- Met vriendelijke groet, Martijn van Groningen
Re: Solr group witch minimum count in each group
Filtering results based on group count isn't supported yet. There is already an issue created for this feature: https://issues.apache.org/jira/browse/SOLR-3152 Martijn On 21 March 2012 11:52, ViruS svi...@gmail.com wrote: Hi, I try to get all duplicated documents in my index. I have signature field and I now can take duplicates witch facet.filed=signature , but I want use only one request to get groups of duplicates documents at end I stop witch this query: /solr/select?q=*:*group=truegroup.field=signaturegroup.truncate=truegroup.ngroups=true I search somthing similar to MySQL query: SELECT * FROM table GROUP BY signature HAVING COUNT(*)2 Thanks in advanced -- Piotr (ViruS) Sikora svi...@gmail.com JID: vi...@hostv.pl -- Met vriendelijke groet, Martijn van Groningen
Re: To truncate or not to truncate (group.truncate vs. facet)
Hi Rasmus, You might want to use the group.facet parameter: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters I think that will give you the right facet counts with faceting. The parameter is not available in Solr 3.x, so you'll need to use a 4.0 nightly build. Martijn On 19 March 2012 13:39, Erick Erickson erickerick...@gmail.com wrote: Groups and faceting are orthogonal and really have nothing to do with each other, so that might be where part of the problem lies. In your example, you can consider grouping by brand and count the *groups* returned, not elements within those groups. Then you're simply counting up the groups returned in the response packet. Note that this is NOT the stuff down in the facets section. You can still facet by color just as you are. Another possibility is to look at pivot faceting: https://issues.apache.org/jira/browse/SOLR-792 but I confess I haven't explored this so don't really understand if it applies to your problem. Best Erick On Mon, Mar 19, 2012 at 5:22 AM, Rasmus Østergård r...@vertica.dk wrote: I have an index that contains variants of cars. In this small sample I have 2 car models (Audi and Volvo) and the Audi is available in black or white, whereas the Volvo is only available in black. On the search page I want to display products not variants - in this test case 2 products should be shown: Audi A4 and Volvo V50 I'm having trouble achieving the desired count in my search facets. This is my test schema: fields field name= variant_id type=string indexed=true stored=true required=true / field name=sku type=string indexed=true stored=true / field name=color type=string indexed=true stored=true/ /fields uniqueKeyvariant_id/uniqueKey Test data: doc field name=skuAudi A4/field field name=brandaudi/field !-- Variant properties -- field name=variant_idA4_black/field field name=colorblack/field /doc doc field name=skuAudi A4/field field name=brandaudi/field !-- Variant properties -- field name=variant_idA4_white/field field name=colorwhite/field /doc doc field name=skuVolvo V50/field field name=brandvolvo/field !-- Variant properties -- field name=variant_idVolvo_V50/field field name=colorblack/field /doc My goal is to to get this facet: brand - audi (1) volvo (1) color - black (2) white (1) I'm trying to use group.truncate to do this but fails. This is my search query when not truncating: /select?facet.field=colorfacet.field=branddefType=edixmaxfacet=truegroup=truegroup.field=skugroup.truncate=false brand - audi (2) volvo (1) color - black (2) white (1) This is my search query when doing truncating: /select?facet.field=colorfacet.field=branddefType=edixmaxfacet=truegroup=truegroup.field=skugroup.truncate=true brand - audi (1) volvo (1) color - black (2) white (0) As can be seen none of the 2 above match my desire. The problem seems to be that I want trucating on only selected facets. I this posible? Thanks! -- Met vriendelijke groet, Martijn van Groningen
Re: JoinQuery and document score problem
Hi Stefan, The score isn't moved from the from side to the to side and as far as I know there isn't a way to configure the scoring of the joined documents. The Solr join query isn't a real join (like in sql) and should be used as filtering mechanism. The best way is to achieve that is to put the join query inside a fq parameter. Martijn On 5 March 2012 14:01, Stefan Moises moi...@shoptimax.de wrote: Hi list, we are using the kinda new JoinQuery feature in Solr 4.x Trunk and are facing a problem (and also Solr 3.5. with the JoinQuery patch applied) ... We have documents with a parent - child relationship where a parent can have any number of childs, parents being identified by the field parentid. Now after a join (from the field parentid to id) to get the parent documents only (and to filter out the variants/childs of the parent documents), the document score gets lost - all the returned documents have a score of 1.0 - if we remove the join from the query, the scores are fine again. Here is an example call: http://localhost:8983/solr4/**select?qt=dismaxq={!join%** 20from=parentid%20to=id}foo**fl=id,title,scorehttp://localhost:8983/solr4/select?qt=dismaxq=%7B!join%20from=parentid%20to=id%7Dfoofl=id,title,score All the results now have a score of 1.0, which makes the order of results pretty much random and the scoring therefore useless... :( (the same applies for the standard query type, so it's not the dismax parser) I can't imagine this is expected behaviour...? Is there an easy way to get the right scores for the joined documents (e.g. using the max. score of the childs)? Can the scoring of joined documents be configured somewhere / somehow? Thanks a lot in advance, best regards, Stefan -- Mit den besten Grüßen aus Nürnberg, Stefan Moises * Stefan Moises Senior Softwareentwickler Leiter Modulentwicklung shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de * -- Met vriendelijke groet, Martijn van Groningen
Re: Multiple Sort for Group/Folding
Hi Mauro, During the first pass search the sort param is used to determine the top N groups. Then during the second pass search the documents inside the top N groups are sorted using the group.sort parameter. The group.sort doesn't change how the groups them self are sorted. Martijn On 11 January 2012 08:03, Mauro Asprea mauroasp...@gmail.com wrote: Hi, I'm having some issues trying to sort my grouped results by more than one field. If I use just one, independently of which I use it just work fine (I mean it sorts). I have a case that the first sorting key is equal for all the head docs of each group, so I expect to return the groups sorted by its second sorting key. But its not the case. Only sorts using the first key no matter what. This is my query: fq=type:Moviefq={!tag=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z TO 2012\-01\-12T08\:59\:59Z]fq=event_id_i:[1 TO *]sort=location_weight_i desc, weight_i descq=Village Avellanedafl=* scoreqf=name_texts location_name_textdefType=dismaxstart=0rows=12group=truegroup.field=event_id_str_sgroup.field=location_name_str_sgroup.sort=date_start_dt ascgroup.limit=10group.limit=1facet=truefacet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z TO 2012\-01\-12T08\:59\:59Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-12T09\:00\:00Z TO 2012\-01\-13T08\:59\:59Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-13T21\:00\:00Z TO 2012\-01\-16T08\:59\:59Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z TO 2012\-01\-18T12\:47\:38Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z TO 2012\-02\-10T12\:47\:38Z] Using the last Solr release 3.5 Thanks! -- Mauro Asprea E-Mail: mauroasp...@gmail.com Mobile: +34 654297582 Skype: mauro.asprea -- Met vriendelijke groet, Martijn van Groningen
Re: Setting group.ngroups=true considerable slows down queries
Hi! As as I know currently there isn't another way. Unfortunately the performance degrades badly when having a lot of unique groups. I think an issue should be opened to investigate how we can improve this... Question: Does Solr have a decent chuck of heap space (-Xmx)? Because grouping requires quite some heap space (also without group.ngroups=true). Martijn On 9 December 2011 23:08, Michael Jakl jakl.mich...@gmail.com wrote: Hi! On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: On what field type are you grouping and what version of Solr are you using? Grouping by string field is faster. The field is defined as follows: field name=signature type=string indexed=true stored=true / Grouping itself is quite fast, only computing the number of groups seems to increase significantly with the number of documents (linear). I was hoping for a faster solution to compute the total number of distinct documents (or in other terms, the number of distinct values in the signature field). Facets came to mind, but as far as I could see, they don't offer a total number of facets as well. I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing). Thanks, Michael On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote: Hi, I'm using the grouping feature of Solr to return a list of unique documents together with a count of the duplicates. Essentially I use Solr's signature algorithm to create the signature field and use grouping on it. To provide good numbers for paging through my result list, I'd like to compute the total number of documents found (= matches) and the number of unique documents (= ngroups). Unfortunately, enabling group.ngroups considerably slows down the query (from 500ms to 23000ms for a result list of roughly 30 documents). Is there a faster way to compute the number of groups (or unique values in the signature field) in the search result? My Solr instance currently contains about 50 million documents and around 10% of them are duplicates. Thank you, Michael -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Setting group.ngroups=true considerable slows down queries
I'd not make a subtaks onder SOLR-236 b/c it is related to a completely different implementation which was never committed. SOLR-2205 is related to general result grouping and think should be closed. I'd make a new issue for improving the performance of group.ngroups=true when there are a lot of unique groups. Martijn On 12 December 2011 14:32, Michael Jakl jakl.mich...@gmail.com wrote: Hi! On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: As as I know currently there isn't another way. Unfortunately the performance degrades badly when having a lot of unique groups. I think an issue should be opened to investigate how we can improve this... Question: Does Solr have a decent chuck of heap space (-Xmx)? Because grouping requires quite some heap space (also without group.ngroups=true). Thanks, for answering. The Server has gotten as much memory as the machine can afford (without swapping): -Xmx21g \ -Xms4g \ Shall I open an issue as a subtask of SOLR-236 even though there is already a performance related task (SOLR-2205)? Cheers, Michael -- Met vriendelijke groet, Martijn van Groningen
Re: Field collapsing results caching
There is no cross query cache for result grouping. The only caching option out there is the group.cache.percent option: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters Martijn On 8 December 2011 14:29, Kissue Kissue kissue...@gmail.com wrote: Hi, I was just testing field collapsing in my solr admin on solr 3.5.0. I have observed that the results of field collapsing are not being cached unlike other solr query results. Am doing the same query multiple times and the time taken still remains approximately the same. Is there something i need to configure? Thanks. -- Met vriendelijke groet, Martijn van Groningen
Re: Setting group.ngroups=true considerable slows down queries
Hi Micheal, On what field type are you grouping and what version of Solr are you using? Grouping by string field is faster. Martijn On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote: Hi, I'm using the grouping feature of Solr to return a list of unique documents together with a count of the duplicates. Essentially I use Solr's signature algorithm to create the signature field and use grouping on it. To provide good numbers for paging through my result list, I'd like to compute the total number of documents found (= matches) and the number of unique documents (= ngroups). Unfortunately, enabling group.ngroups considerably slows down the query (from 500ms to 23000ms for a result list of roughly 30 documents). Is there a faster way to compute the number of groups (or unique values in the signature field) in the search result? My Solr instance currently contains about 50 million documents and around 10% of them are duplicates. Thank you, Michael -- Met vriendelijke groet, Martijn van Groningen
Re: Group.ngroup parameter memory consumption
BTW this applies for 4.0-dev. In 3x the String instance from a StringIndex is directly used, this is then put into a list. So there is no extra object instance created per group matching the query. Martijn On 12 November 2011 08:49, Rafał Kuć r@solr.pl wrote: Hello! Thanks, that's what I was looking for :) -- Regards, Rafał Kuć http://solr.pl The ngroup option collects per search the number of unique groups matching the query. Based on the collected groups it returns the count. So it depends of the number of groups matching the query. To get more in detail: per unique group a ByteRef instance is created to represent a group and this put into a ArrayList. I think you can use this to roughly calculate the memory used per request. Besides this the ngroup relies in most cases on the FieldCache which is also takes a decent amount of memory. But just like sorting this is cache is shared between requests. Martijn On 11 November 2011 22:34, Rafał Kuć r@solr.pl wrote: Hello! I was wondering if there is a way for calculating the memory consumption of group.ngroups parameter. I know the the answer can be 'that depends', but what I'm actually wondering about is what the memory consumption depends on - number of documents returned by a query or number of groups ? Or maybe by it depends on another factor ? -- Regards, Rafał Kuć -- Met vriendelijke groet, Martijn van Groningen
Re: Group.ngroup parameter memory consumption
The ngroup option collects per search the number of unique groups matching the query. Based on the collected groups it returns the count. So it depends of the number of groups matching the query. To get more in detail: per unique group a ByteRef instance is created to represent a group and this put into a ArrayList. I think you can use this to roughly calculate the memory used per request. Besides this the ngroup relies in most cases on the FieldCache which is also takes a decent amount of memory. But just like sorting this is cache is shared between requests. Martijn On 11 November 2011 22:34, Rafał Kuć r@solr.pl wrote: Hello! I was wondering if there is a way for calculating the memory consumption of group.ngroups parameter. I know the the answer can be 'that depends', but what I'm actually wondering about is what the memory consumption depends on - number of documents returned by a query or number of groups ? Or maybe by it depends on another factor ? -- Regards, Rafał Kuć -- Met vriendelijke groet, Martijn van Groningen
Re: Selective Result Grouping
Ok I think I get this. I think this can be achieved if one could specify a filter inside a group and only documents that pass the filter get grouped. For example only group documents with the value image for the mimetype field. This filter should be specified per group command. Maybe we should open an issue for this? Martijn On 1 November 2011 19:58, entdeveloper cameron.develo...@gmail.com wrote: Martijn v Groningen-2 wrote: When using the group.field option values must be the same otherwise they don't get grouped together. Maybe fuzzy grouping would be nice. Grouping videos and images based on mimetype should be easy, right? Videos have a mimetype that start with video/ and images have a mimetype that start with image/. Storing the mime type's subtype and type in separate fields and group on the type field would do the job. Off course you need to know the mimetype during indexing, but solutions like Apache Tika can do that for you. Not necessarily interested in grouping by mimetype (that's an analysis issue). I simply used videos and images as an example. I'm not sure what you mean by fuzzy grouping. But my goal is to have collapse be more selective somehow on what gets grouped. As a more specific example, I have a field called 'type', with the following possible field values: Type -- image video webpage Basically I want to be able to collapse all the images into a single result so that they don't fill up the first page of the results. This is not possible with the current grouping implementation because if you call group.field=type, it'll group everything. I do not want to collapse videos or webpages, only images. I've attached a screenshot of google's srp to help explain what I mean. http://lucene.472066.n3.nabble.com/file/n3471548/Screen_Shot_2011-11-01_at_11.52.04_AM.png Hopefully that makes more sense. If it's still not clear I can email you privately. -- View this message in context: http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3471548.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: facet with group by (or field collapsing)
collapse.facet=after doesn't exists in Solr 3.3. This parameter exists in the SOLR-236 patches and is implemented differently in the released versions of Solr. From Solr 3.4 you can use group.truncate. The facet counts are then computed based on the most relevant documents per group. Martijn On 3 November 2011 22:47, erj35 eric.ja...@yale.edu wrote: I'm attempting the following query: http://{host}/apache-solr-3.3.0/select/?q=cesyversion=2.2start=0rows=10indent=ongroup=truegroup.field=SIPgroup.limit=1facet=truefacet.field=REPOSITORYNAME The result is 4 matches all in 1 group (with group.limit=1). Rather than show facet.field=REPOSITORYNAME's 4 facets, I want to see the REPOSITORYNAMES facet with a count of 1 (for the 1 group returned) with the value of the REPOSITORYNAMES field in the 1 doc returned in the group. Is this possible? I tried adding the parameter collapse.facet=after, but that seemed to have no effect. -- View this message in context: http://lucene.472066.n3.nabble.com/facet-with-group-by-or-field-collapsing-tp497252p3478515.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: UnInvertedField vs FieldCache for facets for single-token text fields
Hi Micheal, The FieldCache is an easier data structure and easier to create, so I also expect it to be faster. Unfortunately for TextField UnInvertedField is always used even if you have one token per document. I think overriding the multiValuedFieldCache method and return false would work. If you're using 4.0-dev (trunk) I'd use facet.method=fcs (this parameter is only useable if multiValuedFieldCache method returns false) This is per segment faceting and the cache will only be extended for new segments. This field facet approach is better for indexes with frequent changes. I think this even faster in your case then just using the FieldCache method (which operates on a top level reader. After each commit the complete cache is invalid and has to be recreated). Otherwise I'd try facet.method=enum which is fast if you have fewer distinct facet values (num of docs doesn't influence the performance that much). The facet.method=enum option is also valid for normal TextFields, so no need to have custom code. Martijn On 3 November 2011 21:16, Michael Ryan mr...@moreover.com wrote: I have some fields I facet on that are TextFields but have just a single token. The fieldType looks like this: fieldType name=myStringFieldType class=solr.TextField indexed=true stored=false omitNorms=true sortMissingLast=true positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType SimpleFacets uses an UnInvertedField for these fields because multiValuedFieldCache() returns true for TextField. I tried changing the type for these fields to the plain string type (StrField). The facets *seem* to be generated much faster. Is it expected that FieldCache would be faster than UnInvertedField for single-token strings like this? My goal is to make the facet re-generation after a commit as fast as possible. I would like to continue using TextField for these fields since I have a need for filters like LowerCaseFilterFactory, which still produces a single token. Is it safe to extend TextField and have multiValuedFieldCache() return false for these fields, so that UnInvertedField is not used? Or is there a better way to accomplish what I'm trying to do? -Michael -- Met vriendelijke groet, Martijn van Groningen
Re: joins and filter queries effecting scoring
Have your tried using the join in the fq instead of the q? Like this (assuming user_id_i is a field in the post document type and self_id_i a field in the user document type): q=posts_text:hellofq={!join from=self_id_i to=user_id_i}is_active_boolean:true In this example the fq produces a docset that contains all user documents that are active. This docset is used as filter during the execution of the main query (q param), so it only returns posts with the contain the text hello for active users. Martijn On 28 October 2011 01:57, Jason Toy jason...@gmail.com wrote: Does anyone have any idea on this issue? On Tue, Oct 25, 2011 at 11:40 AM, Jason Toy jason...@gmail.com wrote: Hi Yonik, Without a Join I would normally query user docs with: q=data_text:testfq=is_active_boolean:true With joining users with posts, I get no no results: q={!join from=self_id_i to=user_id_i}data_text:testfq=is_active_boolean:truefq=posts_text:hello I am able to use this query, but it gives me the results in an order that I dont want(nor do I understand its order): q={!join from=self_id_i to=user_id_i}data_text:test AND is_active_boolean:truefq=posts_text:hello I want the order to be the same as I would get from my original q=data_text:testfq=is_active_boolean:true, but with the ability to join with the Posts docs. On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley yo...@lucidimagination.com wrote: Can you give an example of the request (URL) you are sending to Solr? -Yonik http://www.lucidimagination.com On Mon, Oct 24, 2011 at 3:31 PM, Jason Toy jason...@gmail.com wrote: I have 2 types of docs, users and posts. I want to view all the docs that belong to certain users by joining posts and users together. I have to filter the users with a filter query of is_active_boolean:true so that the score is not effected,but since I do a join, I have to move the filter query to the query parameter so that I can get the filter applied. The problem is that since the is_active_boolean is moved to the query, the score is affected which returns back an order that I don't want. If I leave the is_active_boolean:true in the fq paramater, I get no results back. My question is how can I apply a filter query to users so that the score is not effected? -- - sent from my mobile -- - sent from my mobile -- Met vriendelijke groet, Martijn van Groningen
Re: Solr 3.4 group.truncate does not work with facet queries
Hi Ian, I think this is a bug. After looking into the code the facet.query feature doesn't take into account the group.truncate option. This needs to be fixed. You can open a new issue in Jira if you want to. Martijn On 28 October 2011 12:09, Ian Grainger i...@isfluent.com wrote: Hi, I'm using Grouping with group.truncate=true, The following simple facet query: facet.query=Monitor_id:[38 TO 40] Doesn't give the same number as the nGroups result (with grouping.ngroups=true) for the equivalent filter query: fq=Monitor_id:[38 TO 40] I thought they should be the same - from the Wiki page: 'group.truncate: If true, facet counts are based on the most relevant document of each group matching the query.' What am I doing wrong? If I turn off group.truncate then the counts are the same, as I'd expect - but unfortunately I'm only interested in the grouped results. - I have also asked this question on StackOverflow, here: http://stackoverflow.com/questions/7905756/solr-3-4-group-truncate-does-not-work-with-facet-queries Thanks! -- Ian i...@isfluent.com a...@endissolutions.com +44 (0)1223 257903 -- Met vriendelijke groet, Martijn van Groningen
Re: About the indexing process
Hi Amos, How are you currently indexing files? Are you indexing Solr input documents or just regular files? You can use Solr cell to index binary files: http://wiki.apache.org/solr/ExtractingRequestHandler Martijn On 25 October 2011 10:21, 刘浪 liu.l...@eisoo.com wrote: Hi, I appreciate you can help me. When I index a file, can I transfer the content of the file and metadate to solr to construct the index, instead of the file path and metadate? Thank you! Amos -- Met vriendelijke groet, Martijn van Groningen
Re: Selective Result Grouping
The current grouping functionality using group.field is basically all-or-nothing: all documents will be grouped by the field value or none will. So there would be no way to, for example, collapse just the videos or images like they do in google. When using the group.field option values must be the same otherwise they don't get grouped together. Maybe fuzzy grouping would be nice. Grouping videos and images based on mimetype should be easy, right? Videos have a mimetype that start with video/ and images have a mimetype that start with image/. Storing the mime type's subtype and type in separate fields and group on the type field would do the job. Off course you need to know the mimetype during indexing, but solutions like Apache Tika can do that for you. -- Met vriendelijke groet, Martijn van Groningen
Re: Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound
Hi Frank, How is Solr deployed? And how did you upgrade? The commons-lang library (containing ArrayUtils) is included in the Solr war file. Martijn On 28 September 2011 09:16, Frank Romweber fr...@romweber.de wrote: I use drupal for accessing the solr search engine. After updating an creating my new index everthing works as before. Then I activate the group=true and group.field=site and solr delivers me the wanted search results but in Drupal nothing appears just an empty search page. I found out that the group changes the resultset names. No problem solr offers for this case the group.main=true parameter. So I added this and get this 500 error. HTTP Status 500 - org/apache/commons/lang/ArrayUtils java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573) at org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675) at org.apache.solr.search.Grouping.execute(Grouping.java:339) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) I found out that solr didt find the class ArrayUtils.class. I try a lot of things to get this work. Setting JAVA_HOME and CLASSPATH vars and I changed the jre without any success. I am really wondering all my other programms are still running even solr in the normal mode is working and accesibly but not the group.main=true function. So my question is now what is nessesary to get this work? Any help is apreciated. Thx frank -- Met vriendelijke groet, Martijn van Groningen
Re: Problem with SolrJ and Grouping
Also the error you described when wt=xml and using SolrJ is also fixed in 3.4 (and in trunk / branch3x). You can wait for the 3.4 release of use a night 3x build. Martijn On 12 September 2011 12:41, Sanal K Stephen sanalkstep...@gmail.com wrote: Kirill, Parsing the grouped result using SolrJ is not released yet I think..its going to release with Solr 3.4.0.SolrJ client cannot parse grouped and range facets results SOLR-2523. see the release notes of Solr 3.4.0 http://wiki.apache.org/solr/ReleaseNote34 On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov lykov.kir...@gmail.comwrote: I found that SolrQuery doesn’t work with grouping. I constructed SolrQuery this way: solrQuery = constructFullSearchQuery(searchParams); solrQuery.set(group, true); solrQuery.set(group.field, GeonameId); Solr successfully handles request and writes about that in log: INFO: [] webapp=/solr path=/select params={start=1q=*:*timeAllowed=1500group.field=GeonameIdgroup=truewt=xmlrows=20version=2.2} hits=12099579 status=0 QTime=2968 The error occurs when SolrJ tries to parse XMLResponseParser.processResponse (line 324), where builder stores “/lst”: Object val = type.read( builder.toString().trim() ); if( val == null type != KnownType.NULL) { throw new XMLStreamException( error reading value:+type, parser.getLocation() ); } vals.add( val ); break; The problem is - val is null. It happens because handler for the type LST returns null(line 178 in the same file): LST (false) { @Override public Object read( String txt ) { return null; } }, I don’t understand why it works this way. Xml which was returned by Solr is valid. If any I attached response xml to the letter. The error occures in the line 3, column 14 661. I use apache solr 3.3.0 and the same SolrJ. -- Best regards, Kirill Lykov, Software Engineer, Data East LLC, tel.:+79133816052, LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16 -- Regards, Sanal Kannappilly Stephen -- Met vriendelijke groet, Martijn van Groningen
Re: Nested documents
To support this, we also need to implement indexing block of documents in Solr. Basically the UpdateHandler should also use this method: IndexWriter#addDocuments(Collection documents) On 12 September 2011 01:01, Michael McCandless luc...@mikemccandless.com wrote: Even if it applies, this is for Lucene. I don't think we've added Solr support for this yet... we should! Mike McCandless http://blog.mikemccandless.com On Sun, Sep 11, 2011 at 12:16 PM, Erick Erickson erickerick...@gmail.com wrote: Does this JIRA apply? https://issues.apache.org/jira/browse/LUCENE-3171 Best Erick On Sat, Sep 10, 2011 at 8:32 PM, Andy angelf...@yahoo.com wrote: Hi, Does Solr support nested documents? If not is there any plan to add such a feature? Thanks. -- Met vriendelijke groet, Martijn van Groningen
Re: Problem with SolrJ and Grouping
The changes to that class were minor. There is only support for parsing a grouped response. Check the QueryResponse class there is a method getGroupResponse() I ran into similar exceptions when creating the QueryResponseTest#testGroupResponse test. The test use a xml response from a file. On 12 September 2011 15:38, Kirill Lykov lykov.kir...@gmail.com wrote: Martijn, I can't find the fixed version. I've got the last version of SolrJ but I see only minor changes in XMLResponseParser.java. And it doesn't support grouping yet. I also checked branch_3x, branch for 3.4. On Mon, Sep 12, 2011 at 5:45 PM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: Also the error you described when wt=xml and using SolrJ is also fixed in 3.4 (and in trunk / branch3x). You can wait for the 3.4 release of use a night 3x build. Martijn On 12 September 2011 12:41, Sanal K Stephen sanalkstep...@gmail.com wrote: Kirill, Parsing the grouped result using SolrJ is not released yet I think..its going to release with Solr 3.4.0.SolrJ client cannot parse grouped and range facets results SOLR-2523. see the release notes of Solr 3.4.0 http://wiki.apache.org/solr/ReleaseNote34 On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov lykov.kir...@gmail.comwrote: I found that SolrQuery doesn’t work with grouping. I constructed SolrQuery this way: solrQuery = constructFullSearchQuery(searchParams); solrQuery.set(group, true); solrQuery.set(group.field, GeonameId); Solr successfully handles request and writes about that in log: INFO: [] webapp=/solr path=/select params={start=1q=*:*timeAllowed=1500group.field=GeonameIdgroup=truewt=xmlrows=20version=2.2} hits=12099579 status=0 QTime=2968 The error occurs when SolrJ tries to parse XMLResponseParser.processResponse (line 324), where builder stores “/lst”: Object val = type.read( builder.toString().trim() ); if( val == null type != KnownType.NULL) { throw new XMLStreamException( error reading value:+type, parser.getLocation() ); } vals.add( val ); break; The problem is - val is null. It happens because handler for the type LST returns null(line 178 in the same file): LST (false) { @Override public Object read( String txt ) { return null; } }, I don’t understand why it works this way. Xml which was returned by Solr is valid. If any I attached response xml to the letter. The error occures in the line 3, column 14 661. I use apache solr 3.3.0 and the same SolrJ. -- Best regards, Kirill Lykov, Software Engineer, Data East LLC, tel.:+79133816052, LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16 -- Regards, Sanal Kannappilly Stephen -- Met vriendelijke groet, Martijn van Groningen -- Best regards, Kirill Lykov, Software Engineer, Data East LLC, tel.:+79133816052, LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16 -- Met vriendelijke groet, Martijn van Groningen
Re: Sorting groups by numFound group size
Not yet. If you want you can create an issue for sorting groups by numFound. On 9 September 2011 18:49, O. Klein kl...@octoweb.nl wrote: I am also looking for way to sort on numFound. Has an issue been created? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3323420.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: TermsComponent from deleted document
I'd use the suggester: http://wiki.apache.org/solr/Suggester The suggester can give a collation. The TermsComponent can't do that. The suggester builds on top of the spellchecking infrastructure, so should be easy to use if you're familiar with that. Martijn On 10 September 2011 08:37, Manish Bafna manish.bafna...@gmail.com wrote: Which is preferable? using TermsComponent or Facets for autosuggest? On Fri, Sep 9, 2011 at 10:33 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : http://wiki.apache.org/solr/TermsComponent states that TermsComponent will : return frequencies from deleted documents too. : : Is there anyway to omit the deleted documents to get the frequencies. not really -- until a deleted document is expunged from segment merging, they are still included in the term stats which is what the TermsComponent looks at. If having 100% accurate term counts is really important to you, then you can optimize after doing any updates on your index - but there is obviously a performance tradeoff there. -Hoss -- Met vriendelijke groet, Martijn van Groningen
Re: Sorting groups by numFound group size
No, as far as I know sorting by group count isn't planned. You can create an issue in Jira where future development of this feature can be tracked. On 7 September 2011 23:54, bobsolr xbvbvccvb...@hotmail.com wrote: Hi Martijn, Thanks for the reply. Unfortunately I can't reference the group size using a function or by specifying a schema field, although I use those methods to sort docs within a group. Do you know if sort by group size is available/planned for a future release of solr(maybe 4.0)? if so, it might be worth a shot do get a trunk version to try that. btw, I'm using 3.3 now. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3318060.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: Sorting groups by numFound group size
Sorting groups by numfound isn't possible. You can sort groups by specifying a function or a field (from your schema) in the sort parameter. The numFound isn't a field so that is why you can't sort on it. Martijn On 7 September 2011 08:17, bobsolr xbvbvccvb...@hotmail.com wrote: Hi, I'm using this sample query to group the result set by category: q=testgroup=truegroup.field=category This works as expected and I get this sample response: response: {numFound:1,start:0,docs:[ { ... } {numFound:6,start:0,docs:[ { ... } {numFound:3,start:0,docs:[ { ... } However, I can't find a way to specify the sort order of the groups by number of docs each group has (numFound field). I think the sort param has something to do with it, but I don't know how to use it. Any help will be greatly appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3315740.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: Reading results from FieldCollapsing
The CollapseComponent was never comitted. This class exists in the SOLR-236 patches. You don't need to change the configuration in order to use grouping. The blog you mentioned is based on the SOLR-236 patches. The current grouping in Solr 3.3 has superseded these patches. From Solr 3.4 (not yet released) the QueryResponse class in solrj has a method getGroupResponse. Use this method to get the grouped response. On 31 August 2011 14:10, Erick Erickson erickerick...@gmail.com wrote: Actually, I haven't used the new stuff yet, so I'm not entirely sure either, but that sure would be the place to start. There's some historical ambiguity, Grouping started out as Field Collapsing, and they are used interchangeably. If you go to the bug I linked to and open up the patch file, you'll see the code that implements the grouping in SolrJ, that should give you a good place to start. Best Erick On Wed, Aug 31, 2011 at 3:28 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Erick I downloaded the latest build from ( https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/ ) But, I don't find the required class CollapseComponent in the src. (org.apache.solr.handler.component.CollapseComponent). The SolrJ in 3.4 does seem to have something like GroupResponse, GroupCommand classes, which might be the ones I am looking for (though I am not very sure). Regards Sowmya. On Tue, Aug 30, 2011 at 5:14 PM, Erick Erickson erickerick...@gmail.comwrote: Ahhh, see: https://issues.apache.org/jira/browse/SOLR-2637 Short form: It's in 3.4, not 3.3. So, your choices are: 1 parse the XML yourself 2 get a current 3x build (as in one of the nightlys) and use SolrJ there. Best Erick On Tue, Aug 30, 2011 at 11:09 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Erick Yes, I did see the XML format. But, I did not understand how to read the response using SolrJ. I found some information about Collapse Component on googling, which looks like a normal Solr XML results format. http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/ However, this class CollapseComponent does not seem to exist in Solr 3.3. (org.apache.solr.handler.component.CollapseComponent) was the component mentioned in that link, which is not there in Solr3.3 class files. Sowmya. On Tue, Aug 30, 2011 at 4:48 PM, Erick Erickson erickerick...@gmail.com wrote: Have you looked at the XML (or JSON) response format? You're right, it is different and you have to parse it differently, there are move levels. Try this query and you'll see the format (default data set). http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact Best Erick On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I am trying to use FieldCollapsing feature in Solr. On the Solr admin interface, I give ...group=truegroup.field=fieldA and I can see grouped results. But, I am not able to figure out how to read those results in that order on java. Something like: SolrDocumentList doclist = response.getResults(); gives me a set of results, on which I iterate, and get something like doclist.get(1).getFieldValue(title) etc. After grouping, doing the same step throws me error (apparently, because the returned xml formats are different too). How can I read groupValues and thereby other fieldvalues of the documents inside that group? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Met vriendelijke groet, Martijn van Groningen
Re: Grouping and performing statistics per group
Hi Omri, I think you can achieve that with grouping and the Solr StatsComponent ( http://wiki.apache.org/solr/StatsComponent). In order to compute statistics on groups you must set the option group.truncate=true An example query: q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight Let me know if this fits your use case. Martijn On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote: Thanks, but I actually need something more deterministic and more accurate.. Anyone knows if there is an already existing feature for that? thanks again -- Met vriendelijke groet, Martijn van Groningen
Re: Grouping and performing statistics per group
Or if you dont care about grouped results you can also add the following option: stats.facet=gender On 25 August 2011 14:40, Martijn v Groningen martijn.v.gronin...@gmail.comwrote: Hi Omri, I think you can achieve that with grouping and the Solr StatsComponent ( http://wiki.apache.org/solr/StatsComponent). In order to compute statistics on groups you must set the option group.truncate=true An example query: q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight Let me know if this fits your use case. Martijn On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote: Thanks, but I actually need something more deterministic and more accurate.. Anyone knows if there is an already existing feature for that? thanks again -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Grouping and performing statistics per group
If you take this query from the wiki: http://localhost:8983/solr/select?q=*:*stats=truestats.field=pricestats.field=popularitystats.twopass=truerows=0indent=truestats.facet=inStock In this case you get stats about the popularity per inStock value (true / false). Replacing this values with weight and gender respectively and if you run the query on your index, this should give the result you want. On 25 August 2011 14:47, Omri Cohen omri...@gmail.com wrote: Hi, thanks for your reply.. it doesn't work. I am getting the plain stats results at the end of the response, but no statistics per group.. thanks anyway -- Met vriendelijke groet, Martijn van Groningen
Re: Results Group-By using SolrJ
Hi Omri, SOLR-2637 was concerned with adding grouped response parsing. There is no convenience method for grouping, but you can use the normal SolrQuery#set(...) methods to enable grouping. The following code should enable grouping via SolrJ api: SolrQuery query = new SolrQuery(); query.set(GroupParams.GROUP, true); query.set(GroupParams.GROUP_FIELD, your_field); Martijn On 14 August 2011 16:53, Omri Cohen omri...@gmail.com wrote: Hi All, I am trying to group by results using SolrJ. According to https://issues.apache.org/jira/browse/SOLR-2637 the feature was added, so I upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for grouping in QueryResponse, which is getGroupResponse(). The only thing left that I don't understand is where do I set on which field to group. There is no method that looks like it does so on the SolrQuery object.. Ideas anyone ? thanks, Omri -- Met vriendelijke groet, Martijn van Groningen
Re: SOLR 3.3.0 multivalued field sort problem
The first solution would make sense to me. Some kind of a strategy mechanism for this would allow anyone to define their own rules. Duplicating results would be confusing to me. On 13 August 2011 18:39, Michael Lackhoff mich...@lackhoff.de wrote: On 13.08.2011 18:03 Erick Erickson wrote: The problem I've always had is that I don't quite know what sorting on multivalued fields means. If your field had tokens a and z, would sorting on that field put the doc at the beginning or end of the list? Sure, you can define rules (first token, last token, average of all tokens (whatever that means)), but each solution would be wrong sometime, somewhere, and/or completely useless. Of course it would need rules but I think it wouldn't be too hard to find rules that are at least far better than the current situation. My wish would include an option that decides if the field can be used just once or every value on its own. If the option is set to FALSE, only the first value would be used, if it is TRUE, every value of the field would get its place in the result list. so, if we have e.g. record1: ccc and bbb record2: aaa and zzz it would be either record2 (aaa) record1 (ccc) or record2 (aaa) record1 (bbb) record1 (ccc) record2 (zzz) I find these two outcomes most plausible so I would allow them if technical possible but whatever rule looks more plausible to the experts: some solution is better than no solution. -Michael -- Met vriendelijke groet, Martijn van Groningen
Re: SOLR 3.3.0 multivalued field sort problem
Hi Johnny, Sorting on a multivalued field has never really worked in Solr. Solr versions = 1.4.1 allowed it, but there was a change that an error occurred and that the sorting might not be what you expect. From Solr 3.1 and up sorting on a multivalued isn't allowed and a http 400 is returned. Duplicating documents or fields (what Peter describes) is as far as I know they only option until Lucene supports sorting on multivalued fields properly. Martijn 2011/8/12 Péter Király kirun...@gmail.com Hi, There is no direct solution, you have to create single value field(s) to create search. I am aware of two workarounds: - you can use a random or a given (e.g. the first) instance of the multiple values of the field, and that would be your sortable field. - you can create two sortable fields: _min and _max, which contains the minimal and maximal values of the given field values. At least, that's what I do. Probably there are other solutions as well. Péter -- eXtensible Catalog http://drupal.org/project/xc 2011/8/12 johnnyisrael johnnyi.john...@gmail.com: Hi, I am currently using SOLR 1.4.1, With this version sorting working fine even in multivalued field. Now I am planning to upgrade my SOLR version from 1.4.1 -- 3.3.0, In this latest version sorting is not working on multivauled field. So I am in unable to upgrade my SOLR due to this drawback. Is there a work around available to fix this problem? Thanks, Johnny -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: SOLR Support for Lucene Nested Documents
Hi Josh, Solr doesn't expose this Lucene feature yet. To support this Solr needs to be able to index documents in a single block. Also the BlockJoinQuery needs to be exposed to Solr (this can easily happen via a QParserPlugin). Martijn On 5 August 2011 00:00, Joshua Harness jkharnes...@gmail.com wrote: I noticed that lucene supports 'Nested Documents'. However - I couldn't find mention of this feature within SOLR. Does anybody know how to leverage this lucene feature through SOLR? Thanks! Josh Harness -- Met vriendelijke groet, Martijn van Groningen
Re: Minimum Score
As far as I know there is no built-in solution for this like there is for max score. An alternative approach to the one already mentioned is to send a second request with rows=1 and sort=score asc This will return the lowest scoring document and you can then retrieve the score from that document (if fl=*, score). Martijn On 5 August 2011 10:45, Kissue Kissue kissue...@gmail.com wrote: But that would mean returning all the results without pagination which i dont want to do. I am looking for a way to do it without having to return all the results at once. Thanks. On Thu, Aug 4, 2011 at 11:18 PM, Darren Govoni dar...@ontrenet.com wrote: Off the top of my head you maybe you can get the number of results and then look at the last document and check its score. I believe the results will be ordered by score? On 08/04/2011 05:44 PM, Kissue Kissue wrote: Hi, I am using Solr 3.1 with the SolrJ client library. I can see that it is possible to get the maximum score for your search by using the following: response.getResults().**getMaxScore() I am wondering is there some simple solution to get the minimum score? Many thanks. -- Met vriendelijke groet, Martijn van Groningen
Re: A rant about field collapsing
The development of the field collapse feature is a long and confusing story. The main point is that SOLR-236 was never going to scale and the performance in general was bad. A new approach was needed. This was implemented in SOLR-1682 and added to the trunk (4.0-dev) around September last year. Later in LUCENE-1421 the code was moved from Solr to Lucene as a module / contrib. After that the grouping module and contrib were wired into Solr 3.3 and 4.0-dev in SOLR-2524 and SOLR-2564. Field collapsing is not gone, but it is just a form of result grouping. The core SOLR-236 feature is in 3.3 / 4.0-dev. Other features that SOLR-236 offered will eventually get in. Like for example post grouping facets. The http parameters and response have changed without keeping it compatible with the SOLR-236 patches. I think that isn't a problem since SOLR-236 was never a committed feature. But a widely used feature should never be attached as a patch to a Jira issue for 3+ years On 3 August 2011 18:33, baronDodd barond...@googlemail.com wrote: I am working on an implementation of search within our application using solr. About 2 months ago we had the need to group results by a certain field. After some searching I came across the JIRA in progress for this - field collapsing: https://issues.apache.org/jira/browse/SOLR-236 It was scheduled for the next solr release and had a full set of proper JIRA subtasks and patch files of almost complete implementations attached. So as you can imagine I was happy to apply this patch and build it into our application and await for the next release when it would be part of the main trunk. Now imagine my surprise when we have come around to upgrade to see that suddenly field collapsing has been thrown away in favour of a totally different grouping implementation https://issues.apache.org/jira/browse/SOLR-2524 How was it decided that this would be used instead? It was not made very clear that LUCENE-1421 was in progress which would effectively make the field collapsing work irrelevant by fixing the problem in lucene rather than primarily in solr. This has cost me days of work to now merge our custom changes somehow to the new implementation. I guess it is my own fault for basing our custom changes around an unresolved enhancement but as SOLR-236 had been 3-4 years in progress and SOLR-2524 did not exist at the time it seemed pretty safe to assume that the same problem was not being fixed in 2 totally different ways! -- View this message in context: http://lucene.472066.n3.nabble.com/A-rant-about-field-collapsing-tp3222798p3222798.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: A rant about field collapsing
Well, the original page moved to: http://wiki.apache.org/solr/FieldCollapsingUncommitted Assuming that you're using Solr 3.3 you can't get the grouped result (lst name=grouped) with SolrJ. I added grouping support to SolrJ some time ago and will be in Solr 3.4. You can use a nightly 3.x build to use the grouping support now. You can also use the group.main=true option, that returns a response that is compatible with the normal search response. However you can only use one group command per request (group.field, group.func and group.query). Also there were some bugs with this response format in Solr 3.3 that have been fixed and will be included when Solr 3.4 is released. Martijn On 4 August 2011 15:24, baronDodd barond...@googlemail.com wrote: Ok thank you very much for clearing that up a little. I think another reason I was confused was that the wiki page for grouping was based around the original field collapsing plan at the time which led me to the jira and hence the patch files, rant over! Perhaps you can help to clarify if the current grouping changes work with solrj? In QueryResponse.setResponse() there is a loop which builds up the results object, but has no check at present for grouped in the NamedList, so the solrj client gets no results back when searching with grouping parameters. I assume I can add this on my local working copy and all will be well? -- View this message in context: http://lucene.472066.n3.nabble.com/A-rant-about-field-collapsing-tp3222798p3225361.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: ideas for versioning query?
Hi Mike, how many docs and groups do you have in your index? I think the group.sort option fits your requirements. If I remember correctly group.ngroup=true adds something like 30% extra time on top of the search request with grouping, but that was on my local test dataset (~30M docs, ~8000 groups) and my machine. You might encounter different search times when setting group.ngroup=true. Martijn 2011/8/1 Mike Sokolov soko...@ifactory.com Thanks, Tomas. Yes we are planning to keep a current flag in the most current document. But there are cases where, for a given user, the most current document is not that one, because they only have access to some older documents. I took a look at http://wiki.apache.org/solr/**FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsingand it seems as if it will do what we need here. My one concern is that it might not be efficient at computing group.ngroups for a very large number of groups, which we would ideally want. Is that something I should be worried about? -Mike On 08/01/2011 10:08 AM, Tomás Fernández Löbbe wrote: Hi Michael, I guess this could be solved using grouping as you said. Documents inside a group can be sorted on a field (in your case, the version field, see parameter group.sort), and you can show only the first one. It will be more complex to show facets (post grouping faceting is work in progress but still not committed to the trunk). I would be easier from the Solr side if you could do something at index time, like indicating which document is the current one and which one is an old one (you would need to update the old document whenever a new version is indexed). Regards, Tomás On Mon, Aug 1, 2011 at 10:47 AM, Mike Sokolovsoko...@ifactory.com wrote: A customer has an interesting problem: some documents will have multiple versions. In search results, only the most recent version of a given document should be shown. The trick is that each user has access to a different set of document versions, and each user should see only the most recent version of a document that they have access to. Is this something that can reasonably be solved with grouping? In 3.x? I haven't followed the grouping discussions closely: would someone point me in the right direction please? -- Michael Sokolov Engineering Director www.ifactory.com -- Met vriendelijke groet, Martijn van Groningen
Re: SolrJ and class versions
Where you upgrading from Solr 1.4? SolrJ uses by default for querying the javabin format (wt parameter). The javabin format is not compatible between 1.4 and 3.1 and above. So If your clients where running with SolrJ 1.4 versions I would expect errors to occur. Martijn On 25 July 2011 12:15, Tarjei Huse tar...@scanmine.com wrote: Hi, I recently went through a little hell when I upgraded my Solr servers to 3.2.0. What I didn't anticipate was that my Java SolrJ clients depend on the server version. I would like to add a note about this in the SolrJ docs: http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update Any comments with regard to this? Are all SolrJ methods dependent on Java Serialization and thus class versions? -- Regards / Med vennlig hilsen Tarjei Huse Mobil: 920 63 413 -- Met vriendelijke groet, Martijn van Groningen
Re: SolrJ and class versions
I agree! It should be noted in the documentation. I just wanted to say that SolrJ doen't depend on Java serialization, but uses its own serialization: http://lucene.apache.org/solr/api/solrj/org/apache/solr/common/util/JavaBinCodec.html Martijn On 26 July 2011 15:31, Tarjei Huse tar...@scanmine.com wrote: On 07/26/2011 09:26 AM, Martijn v Groningen wrote: Where you upgrading from Solr 1.4? Yep. SolrJ uses by default for querying the javabin format (wt parameter). The javabin format is not compatible between 1.4 and 3.1 and above. So If your clients where running with SolrJ 1.4 versions I would expect errors to occur. I understood the error when I saw it, but I think the version dependency should be noted in the SolrJ manual. Regards, T Martijn On 25 July 2011 12:15, Tarjei Huse tar...@scanmine.com wrote: Hi, I recently went through a little hell when I upgraded my Solr servers to 3.2.0. What I didn't anticipate was that my Java SolrJ clients depend on the server version. I would like to add a note about this in the SolrJ docs: http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update Any comments with regard to this? Are all SolrJ methods dependent on Java Serialization and thus class versions? -- Regards / Med vennlig hilsen Tarjei Huse Mobil: 920 63 413 -- Regards / Med vennlig hilsen Tarjei Huse Mobil: 920 63 413 -- Met vriendelijke groet, Martijn van Groningen
Re: Possible bug in Solr 3.3 grouping
Hi Nikhil, Thanks for raising this issue. I checked this particular issue in a test case and I ran into the same error, so this is indeed a bug. I've fixed this issue for 3x in revision 1145748. So checking out the latest 3x branch and building Solr yourself should give you this bug fix. Or you can wait until the 3x build produces new nightly artifacts: https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/ The numFound in your case is based on number of documents, not groups. So you might still get an empty result, because there might be less than 40 groups in this result set. You can see the number of groups by using the group.ngroups=true parameter this includes the number of groups, but the the group.format must be grouped otherwise you don't get the number of groups in the response. Martijn On 12 July 2011 09:00, Nikhil Chhaochharia nikhil...@yahoo.com wrote: Hi, I am using Solr 3.3 and have run into a problem with grouping. If 'group.main' is 'true' and 'start' is greater than 'rows', then I do not get any results. A sample response is: response lst name=responseHeaderint name=status0/intint name=QTime602/int/lstlst name=grouped/result name=response numFound=2239538 start=40 maxScore=6.140459/ /response If 'group.main' is false, then I get results. Did anyone else come across this problem? Using the grouping feature with pagination of results will make start rows from the third page onwards. Thanks, Nikhil
Re: SolrJ and Range Faceting
No, that doesn't work yet. I think we'll need to enhance the Solr query parser in order to support this. If that is supported we don't need to use the DataMathParser and this a more general solution also for other client libraries. Martijn On 12 June 2011 20:58, Jamie Johnson jej2...@gmail.com wrote: Martjin, I had not considered doing something like manufacturedate_dt:[2007-02-13T15:26:37Z TO 2007-02-13T15:26:37Z+1YEAR] does this work? If so that completely eliminates the need to use the date math parsers right? On Sun, Jun 12, 2011 at 9:10 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Jamie, Just letting you know that I've added a comment to SOLR-2523. I ran into a class dependency issue regarding MathDateParser. Martijn On 11 June 2011 16:04, Jamie Johnson jej2...@gmail.com wrote: Awesome Martjin, let me know when you have it comitted and I'll check out the latest version. Again thanks! On Sat, Jun 11, 2011 at 8:15 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi James, Good idea! I'll add a getAsFilterQuery method to the patch. Martijn On 6 June 2011 19:32, Jamie Johnson jej2...@gmail.com wrote: Small error, shouldn't be using this.start but should instead be using Double.parseDouble(this.getValue()); and sdf.parse(count.getValue()); respectfully. On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Martijn. I pulled your patch and it looks like what I was looking for. The original FacetField class has a getAsFilterQuery method which returns the criteria to use as an fq parameter, I have logic which does this in my class which works, any chance of getting something like this added to the patch as well? public static class Numeric extends RangeFacetNumber, Number { public Numeric(String name, Number start, Number end, Number gap) { super(name, start, end, gap); } public String getAsFilterQuery(){ Double end = this.start.doubleValue() + this.gap.doubleValue() - 1; return this.name + :[ + this.start + TO + end + ]); } } and for dates (there's a parse exception below which I am not doing anything with currently) public String getAsFilterQuery(){ RangeFacet.Date dateCount = (RangeFacet.Date)count.getRangeFacet(); DateMathParser parser = new DateMathParser(TimeZone.getDefault(), Locale.getDefault()); SimpleDateFormat sdf = new SimpleDateFormat(-MM-dd'T'HH:mm:ss); parser.setNow(dateCount.getStart()); Date end = parser.parseMath(dateCount.getGap()); String startStr = sdf.format(dateCount.getStart()) + Z; String endStr = sdf.format(end) + Z; String label = startStr + TO + endStr; return facetField.getName() + :[ + label + ]); } On Fri, Jun 3, 2011 at 7:05 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Jamie, I don't know why range facets didn't make it into SolrJ. But I've recently opened an issue for this: https://issues.apache.org/jira/browse/SOLR-2523 I hope this will be committed soon. Check the patch out and see if you like it. Martijn On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote: Currently the range and date faceting in SolrJ acts a bit differently than I would expect. Specifically, range facets aren't parsed at all and date facets end up generating filterQueries which don't have the range, just the lower bound. Is there a reason why SolrJ doesn't support these? I have written some things on my end to handle these and generate filterQueries for date ranges of the form dateTime:[start TO end] and I have a function (which I copied from the date faceting) which parses the range facets, but would prefer not to have to maintain these myself. Is there a plan to implement these? Also is there a plan to update FacetField to not have end be a date, perhaps making it a String like start so we can support date and range queries? -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van
Re: SolrJ and Range Faceting
Hi Jamie, Just letting you know that I've added a comment to SOLR-2523. I ran into a class dependency issue regarding MathDateParser. Martijn On 11 June 2011 16:04, Jamie Johnson jej2...@gmail.com wrote: Awesome Martjin, let me know when you have it comitted and I'll check out the latest version. Again thanks! On Sat, Jun 11, 2011 at 8:15 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi James, Good idea! I'll add a getAsFilterQuery method to the patch. Martijn On 6 June 2011 19:32, Jamie Johnson jej2...@gmail.com wrote: Small error, shouldn't be using this.start but should instead be using Double.parseDouble(this.getValue()); and sdf.parse(count.getValue()); respectfully. On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Martijn. I pulled your patch and it looks like what I was looking for. The original FacetField class has a getAsFilterQuery method which returns the criteria to use as an fq parameter, I have logic which does this in my class which works, any chance of getting something like this added to the patch as well? public static class Numeric extends RangeFacetNumber, Number { public Numeric(String name, Number start, Number end, Number gap) { super(name, start, end, gap); } public String getAsFilterQuery(){ Double end = this.start.doubleValue() + this.gap.doubleValue() - 1; return this.name + :[ + this.start + TO + end + ]); } } and for dates (there's a parse exception below which I am not doing anything with currently) public String getAsFilterQuery(){ RangeFacet.Date dateCount = (RangeFacet.Date)count.getRangeFacet(); DateMathParser parser = new DateMathParser(TimeZone.getDefault(), Locale.getDefault()); SimpleDateFormat sdf = new SimpleDateFormat(-MM-dd'T'HH:mm:ss); parser.setNow(dateCount.getStart()); Date end = parser.parseMath(dateCount.getGap()); String startStr = sdf.format(dateCount.getStart()) + Z; String endStr = sdf.format(end) + Z; String label = startStr + TO + endStr; return facetField.getName() + :[ + label + ]); } On Fri, Jun 3, 2011 at 7:05 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Jamie, I don't know why range facets didn't make it into SolrJ. But I've recently opened an issue for this: https://issues.apache.org/jira/browse/SOLR-2523 I hope this will be committed soon. Check the patch out and see if you like it. Martijn On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote: Currently the range and date faceting in SolrJ acts a bit differently than I would expect. Specifically, range facets aren't parsed at all and date facets end up generating filterQueries which don't have the range, just the lower bound. Is there a reason why SolrJ doesn't support these? I have written some things on my end to handle these and generate filterQueries for date ranges of the form dateTime:[start TO end] and I have a function (which I copied from the date faceting) which parses the range facets, but would prefer not to have to maintain these myself. Is there a plan to implement these? Also is there a plan to update FacetField to not have end be a date, perhaps making it a String like start so we can support date and range queries? -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: SolrJ and Range Faceting
Hi James, Good idea! I'll add a getAsFilterQuery method to the patch. Martijn On 6 June 2011 19:32, Jamie Johnson jej2...@gmail.com wrote: Small error, shouldn't be using this.start but should instead be using Double.parseDouble(this.getValue()); and sdf.parse(count.getValue()); respectfully. On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Martijn. I pulled your patch and it looks like what I was looking for. The original FacetField class has a getAsFilterQuery method which returns the criteria to use as an fq parameter, I have logic which does this in my class which works, any chance of getting something like this added to the patch as well? public static class Numeric extends RangeFacetNumber, Number { public Numeric(String name, Number start, Number end, Number gap) { super(name, start, end, gap); } public String getAsFilterQuery(){ Double end = this.start.doubleValue() + this.gap.doubleValue() - 1; return this.name + :[ + this.start + TO + end + ]); } } and for dates (there's a parse exception below which I am not doing anything with currently) public String getAsFilterQuery(){ RangeFacet.Date dateCount = (RangeFacet.Date)count.getRangeFacet(); DateMathParser parser = new DateMathParser(TimeZone.getDefault(), Locale.getDefault()); SimpleDateFormat sdf = new SimpleDateFormat(-MM-dd'T'HH:mm:ss); parser.setNow(dateCount.getStart()); Date end = parser.parseMath(dateCount.getGap()); String startStr = sdf.format(dateCount.getStart()) + Z; String endStr = sdf.format(end) + Z; String label = startStr + TO + endStr; return facetField.getName() + :[ + label + ]); } On Fri, Jun 3, 2011 at 7:05 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Jamie, I don't know why range facets didn't make it into SolrJ. But I've recently opened an issue for this: https://issues.apache.org/jira/browse/SOLR-2523 I hope this will be committed soon. Check the patch out and see if you like it. Martijn On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote: Currently the range and date faceting in SolrJ acts a bit differently than I would expect. Specifically, range facets aren't parsed at all and date facets end up generating filterQueries which don't have the range, just the lower bound. Is there a reason why SolrJ doesn't support these? I have written some things on my end to handle these and generate filterQueries for date ranges of the form dateTime:[start TO end] and I have a function (which I copied from the date faceting) which parses the range facets, but would prefer not to have to maintain these myself. Is there a plan to implement these? Also is there a plan to update FacetField to not have end be a date, perhaps making it a String like start so we can support date and range queries? -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: SolrJ and Range Faceting
Hi Jamie, I don't know why range facets didn't make it into SolrJ. But I've recently opened an issue for this: https://issues.apache.org/jira/browse/SOLR-2523 I hope this will be committed soon. Check the patch out and see if you like it. Martijn On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote: Currently the range and date faceting in SolrJ acts a bit differently than I would expect. Specifically, range facets aren't parsed at all and date facets end up generating filterQueries which don't have the range, just the lower bound. Is there a reason why SolrJ doesn't support these? I have written some things on my end to handle these and generate filterQueries for date ranges of the form dateTime:[start TO end] and I have a function (which I copied from the date faceting) which parses the range facets, but would prefer not to have to maintain these myself. Is there a plan to implement these? Also is there a plan to update FacetField to not have end be a date, perhaps making it a String like start so we can support date and range queries? -- Met vriendelijke groet, Martijn van Groningen
Re: Result Grouping always returns grouped output
Hi Karel, group.main=true should do the trick. When that is set to true the group.format is always simple. Martijn On 27 May 2011 19:13, kare...@gmail.com kare...@gmail.com wrote: Hello, I am using the latest nightly build of Solr 4.0 and I would like to use grouping/field collapsing while maintaining compatibility with my current parser. I am using the regular webinterface to test it, the same commands like in the wiki, just with the field names matching my dataset. Grouping itself works, group=true and group.field return the expected results, but neither group.main=true or group.format=simple seem to change anything. Do I have to include something special in solrconconfig.xml or scheme.xml to make the simple output work? Thanks for any hints, K -- Met vriendelijke groet, Martijn van Groningen
Re: [POLL] How do you (like to) do logging with Solr
[ ] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [X] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! On 16 May 2011 11:32, Gora Mohanty g...@mimirtech.com wrote: On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl jan@cominvent.com wrote: [...] Please tick one of the options below with an [X]: [ X] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Regards, Gora -- Met vriendelijke groet, Martijn van Groningen
Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
[] ASF Mirrors (linked in our release announcements or via the Lucene website) [ X ] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [ X ] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) On 19 January 2011 14:59, Matthew Hall mh...@informatics.jax.org wrote: [X] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) -- Matthew Hall Software Engineer Mouse Genome Informatics mh...@informatics.jax.org (207) 288-6012 -- Met vriendelijke groet, Martijn van Groningen
Re: Field Collapse question
Hi Ken, Not collapsing on null field values is not possible in the patch. However you can if you want to fix this in the patch it is a really small change. Assuming that you're using the default collapsing algorithm you can add the following piece of code in the NonAdjacentDocumentCollapser.java file: if (currentValue == null) { continue; } Place it in the doCollapsing method after the following statement: String currentValue = values.lookup[values.order[currentId]]; This makes sure that documents that have no value in the collapse field are not collapsed. Field collapsing has a big impact on your search times and also on memory usage to a lesser extend. It can increase search times up to 10 times, but this depends per situation. As indexes get bigger this becomes a bigger problem. Also using field collapsing in a distributed environment can cause problems. This is due that collapse information is not shared between shards, resulting in incorrect collapse results. They only work around for this problem I know is, that you 'll have to make sure that the groups are distributes evenly between shards and that a group's documents are not spread across shards. Other then that there are no further major issues with this patch. Many people are using this patch in their Solr setups, but it is a patch so you 'll have to keep that in mind. There are efforts to put grouping functionality into Solr (without patching) in SOLR-236's child issues, so keep an eye on these issues. Cheers, Martijn On 3 July 2010 19:20, osocurious2 ken.fos...@realestate.com wrote: boost I wanted to extend my question some. My original question about collapsing null fields is still open, but in trying to research elsewhere I see a lot of angst about the Field Collapse functionality in general. Can anyone summarize what the current state of affairs is with it? I'm on Solr 1.4, just the latest release build, not any current builds. Field Collapse seems to be in my build because I could do single field collapse just fine (hence my null field question). However there seems to be talk of problems with Field Collapse that aren't fixed yet. What kinds of issues are people having? Should I avoid Field Collapse in a production app for now? (tricky because I'm merging my schema with a third party tool schema and they are using Field Collapse). Any insight would be helpful, thanks Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Field-Collapse-question-tp939118p940923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR-236 Patch
Hi Sam, It seems that the patch is out of sync again with the trunk. Can you try patching with revision 955615? I'll update the patch shortly. Martijn On 24 June 2010 09:49, Amdebirhan, Samson, VF-Group samson.amdebir...@vodafone.com wrote: Hi Trying to apply the SOLR-236 patch to a trunk i get what follows. Can anyone help me understanding what I am missing ? . svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk patch -p0 -i SOLR-236-trunk.patch --dry-run patching file solr/src/test/org/apache/solr/search/fieldcollapse/MyDocTermsIndex.java patching file solr/src/java/org/apache/solr/handler/component/CollapseComponent.java patching file solr/src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml patching file solr/src/java/org/apache/solr/search/fieldcollapse/collector/FieldValueC ountCollapseCollectorFactory.java patching file solr/src/java/org/apache/solr/search/fieldcollapse/collector/DocumentGro upCountCollapseCollectorFactory.java can't find file to patch at input line 1068 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |Index: solr/src/java/org/apache/solr/search/DocSetHitCollector.java |=== |--- solr/src/java/org/apache/solr/search/DocSetHitCollector.java (revision 922957) |+++ solr/src/java/org/apache/solr/search/DocSetHitCollector.java (revision ) . Regards Sam
Re: Field Collapsing SOLR-236
What exactly did not work? Patching, compiling or running it? On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I tried checking out the latest code (rev 956715) the patch did not work on it. Infact i even tried hunting for the revision mentioned earlier in this thread (i.e. rev 955615) but cannot find it in the repository. (it has revision 955569 followed by revision 955785). Any pointers?? Regards Raakhi On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Oh in that case is the code stable enough to use it for production? - Well this feature is a patch and I think that says it all. Although bugs are fixed it is deferentially an experimental feature and people should keep that in mind when using one of the patches. Does it support features which solr 1.4 normally supports? - As far as I know yes. am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? - Maybee http://wiki.apache.org/solr/Deduplication prevents from adding duplicates in you index, but then you miss the collapse counts and other computed values On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Oh in that case is the code stable enough to use it for production? Does it support features which solr 1.4 normally supports? I am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? Regards, Raakhi On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Rakhi, The patch is not compatible with 1.4. If you want to work with the trunk. I'll need to get the src from https://svn.apache.org/repos/asf/lucene/dev/trunk/ Martijn On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi Moazzam, Where did u get the src code from?? I am downloading it from https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 and the latest revision in this location is 955469. so applying the latest patch(dated 17th june 2010) on it still generates errors. Any Pointers? Regards, Raakhi On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com wrote: I knew it wasn't me! :) I found the patch just before I read this and applied it to the trunk and it works! Thanks Mark and martijn for all your help! - Moazzam On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: I've added a new patch to the issue, so building the trunk (rev 955615) with the latest patch should not be a problem. Due to recent changes in the Lucene trunk the patch was not compatible. On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote: On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: p.s. I'd be glad to contribute our Maven build re-organization back to the community to get Solr properly Mavenized so that it can be distributed and released more often. For us the benefit of this structure is that we will be able to overlay addons such as RequestHandlers and other third party support without having to rebuild Solr from scratch. But you don't have to rebuild Solr from scratch to add a new request handler or other plugins - simply compile your custom stuff into a JAR and put it in solr-home/lib (or point to it with lib in solrconfig.xml). Ideally, a Maven Archetype could be created that would allow one rapidly produce a Solr webapp and fire it up in Jetty in mere seconds. How's that any different than cd example; java -jar start.jar? Or do you mean a Solr client webapp? Finally, with projects such as Bobo, integration with Spring would make configuration more consistent and request significantly less java coding just to add new capabilities everytime someone authors a new RequestHandler. It's one line of config to add a new request handler. How many ridiculously ugly confusing lines of Spring XML would it take? The biggest thing I learned about Solr in my work thusfar is that patches like these could be standalone modules in separate projects if it weren't for having to hack the configuration and solrj methods up to adopt them. Which brings me to SolrJ, great API if it would stay generic and have less concern for adding method each time some custom collections and query support for morelikethis or collapseddocs needs to be added. I personally find it silly that we customize SolrJ for all these request handlers anyway. You get a decent navigable data structure back from general SolrJ query requests as it is, there's no need to build in all
Re: collapse exception
I checked your stacktrace and I can't remember putting SolrIndexSearcher.getDocListAndSet(...) in the doQuery(...) method. I guess the patch was modified before it was applied. I think the error occurs when you do a field collapse search with a fq parameter. That is the only reason I can think of why this exception is thrown. When this component become a contrib? Using patch is so annoying Patching is a bit of a hassle. This patch has some changes in the SolrIndexSearcher which makes it difficult to make it a contrib or an extension. On 22 June 2010 04:52, Li Li fancye...@gmail.com wrote: I don't know because it's patched by someone else but I can't get his help. When this component become a contrib? Using patch is so annoying 2010/6/22 Martijn v Groningen martijn.is.h...@gmail.com: What version of Solr and which patch are you using? On 21 June 2010 11:46, Li Li fancye...@gmail.com wrote: it says Either filter or filterList may be set in the QueryCommand, but not both. I am newbie of solr and have no idea of the exception. What's wrong with it? thank you. java.lang.IllegalArgumentException: Either filter or filterList may be set in the QueryCommand, but not both. at org.apache.solr.search.SolrIndexSearcher$QueryCommand.setFilter(SolrIndexSearcher.java:1711) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1286) at org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:205) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:246) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:173) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:174) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Field Collapsing SOLR-236
Oh in that case is the code stable enough to use it for production? - Well this feature is a patch and I think that says it all. Although bugs are fixed it is deferentially an experimental feature and people should keep that in mind when using one of the patches. Does it support features which solr 1.4 normally supports? - As far as I know yes. am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? - Maybee http://wiki.apache.org/solr/Deduplication prevents from adding duplicates in you index, but then you miss the collapse counts and other computed values On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Oh in that case is the code stable enough to use it for production? Does it support features which solr 1.4 normally supports? I am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? Regards, Raakhi On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Rakhi, The patch is not compatible with 1.4. If you want to work with the trunk. I'll need to get the src from https://svn.apache.org/repos/asf/lucene/dev/trunk/ Martijn On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi Moazzam, Where did u get the src code from?? I am downloading it from https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 and the latest revision in this location is 955469. so applying the latest patch(dated 17th june 2010) on it still generates errors. Any Pointers? Regards, Raakhi On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com wrote: I knew it wasn't me! :) I found the patch just before I read this and applied it to the trunk and it works! Thanks Mark and martijn for all your help! - Moazzam On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: I've added a new patch to the issue, so building the trunk (rev 955615) with the latest patch should not be a problem. Due to recent changes in the Lucene trunk the patch was not compatible. On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote: On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: p.s. I'd be glad to contribute our Maven build re-organization back to the community to get Solr properly Mavenized so that it can be distributed and released more often. For us the benefit of this structure is that we will be able to overlay addons such as RequestHandlers and other third party support without having to rebuild Solr from scratch. But you don't have to rebuild Solr from scratch to add a new request handler or other plugins - simply compile your custom stuff into a JAR and put it in solr-home/lib (or point to it with lib in solrconfig.xml). Ideally, a Maven Archetype could be created that would allow one rapidly produce a Solr webapp and fire it up in Jetty in mere seconds. How's that any different than cd example; java -jar start.jar? Or do you mean a Solr client webapp? Finally, with projects such as Bobo, integration with Spring would make configuration more consistent and request significantly less java coding just to add new capabilities everytime someone authors a new RequestHandler. It's one line of config to add a new request handler. How many ridiculously ugly confusing lines of Spring XML would it take? The biggest thing I learned about Solr in my work thusfar is that patches like these could be standalone modules in separate projects if it weren't for having to hack the configuration and solrj methods up to adopt them. Which brings me to SolrJ, great API if it would stay generic and have less concern for adding method each time some custom collections and query support for morelikethis or collapseddocs needs to be added. I personally find it silly that we customize SolrJ for all these request handlers anyway. You get a decent navigable data structure back from general SolrJ query requests as it is, there's no need to build in all these convenience methods specific to all the Solr componetry. Sure, it's convenient, but it's a maintenance headache and as you say, not generic. But hacking configuration is reasonable, I think, for adding in plugins. I guess you're aiming for some kind of Spring-like auto-discovery of plugins? Yeah, maybe, but I'm pretty -1 on Spring coming into Solr. It's overkill and ugly, IMO. But you like it :) And that's cool by me, to each their own. Oh, and Hi Mark! :) Erik -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: collapse exception
What version of Solr and which patch are you using? On 21 June 2010 11:46, Li Li fancye...@gmail.com wrote: it says Either filter or filterList may be set in the QueryCommand, but not both. I am newbie of solr and have no idea of the exception. What's wrong with it? thank you. java.lang.IllegalArgumentException: Either filter or filterList may be set in the QueryCommand, but not both. at org.apache.solr.search.SolrIndexSearcher$QueryCommand.setFilter(SolrIndexSearcher.java:1711) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1286) at org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:205) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:246) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:173) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:174) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) -- Met vriendelijke groet, Martijn van Groningen
Re: Field Collapsing SOLR-236
Hi Rakhi, The patch is not compatible with 1.4. If you want to work with the trunk. I'll need to get the src from https://svn.apache.org/repos/asf/lucene/dev/trunk/ Martijn On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi Moazzam, Where did u get the src code from?? I am downloading it from https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 and the latest revision in this location is 955469. so applying the latest patch(dated 17th june 2010) on it still generates errors. Any Pointers? Regards, Raakhi On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com wrote: I knew it wasn't me! :) I found the patch just before I read this and applied it to the trunk and it works! Thanks Mark and martijn for all your help! - Moazzam On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: I've added a new patch to the issue, so building the trunk (rev 955615) with the latest patch should not be a problem. Due to recent changes in the Lucene trunk the patch was not compatible. On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote: On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: p.s. I'd be glad to contribute our Maven build re-organization back to the community to get Solr properly Mavenized so that it can be distributed and released more often. For us the benefit of this structure is that we will be able to overlay addons such as RequestHandlers and other third party support without having to rebuild Solr from scratch. But you don't have to rebuild Solr from scratch to add a new request handler or other plugins - simply compile your custom stuff into a JAR and put it in solr-home/lib (or point to it with lib in solrconfig.xml). Ideally, a Maven Archetype could be created that would allow one rapidly produce a Solr webapp and fire it up in Jetty in mere seconds. How's that any different than cd example; java -jar start.jar? Or do you mean a Solr client webapp? Finally, with projects such as Bobo, integration with Spring would make configuration more consistent and request significantly less java coding just to add new capabilities everytime someone authors a new RequestHandler. It's one line of config to add a new request handler. How many ridiculously ugly confusing lines of Spring XML would it take? The biggest thing I learned about Solr in my work thusfar is that patches like these could be standalone modules in separate projects if it weren't for having to hack the configuration and solrj methods up to adopt them. Which brings me to SolrJ, great API if it would stay generic and have less concern for adding method each time some custom collections and query support for morelikethis or collapseddocs needs to be added. I personally find it silly that we customize SolrJ for all these request handlers anyway. You get a decent navigable data structure back from general SolrJ query requests as it is, there's no need to build in all these convenience methods specific to all the Solr componetry. Sure, it's convenient, but it's a maintenance headache and as you say, not generic. But hacking configuration is reasonable, I think, for adding in plugins. I guess you're aiming for some kind of Spring-like auto-discovery of plugins? Yeah, maybe, but I'm pretty -1 on Spring coming into Solr. It's overkill and ugly, IMO. But you like it :) And that's cool by me, to each their own. Oh, and Hi Mark! :) Erik -- Met vriendelijke groet, Martijn van Groningen
Re: Field Collapsing SOLR-236
I've added a new patch to the issue, so building the trunk (rev 955615) with the latest patch should not be a problem. Due to recent changes in the Lucene trunk the patch was not compatible. On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote: On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: p.s. I'd be glad to contribute our Maven build re-organization back to the community to get Solr properly Mavenized so that it can be distributed and released more often. For us the benefit of this structure is that we will be able to overlay addons such as RequestHandlers and other third party support without having to rebuild Solr from scratch. But you don't have to rebuild Solr from scratch to add a new request handler or other plugins - simply compile your custom stuff into a JAR and put it in solr-home/lib (or point to it with lib in solrconfig.xml). Ideally, a Maven Archetype could be created that would allow one rapidly produce a Solr webapp and fire it up in Jetty in mere seconds. How's that any different than cd example; java -jar start.jar? Or do you mean a Solr client webapp? Finally, with projects such as Bobo, integration with Spring would make configuration more consistent and request significantly less java coding just to add new capabilities everytime someone authors a new RequestHandler. It's one line of config to add a new request handler. How many ridiculously ugly confusing lines of Spring XML would it take? The biggest thing I learned about Solr in my work thusfar is that patches like these could be standalone modules in separate projects if it weren't for having to hack the configuration and solrj methods up to adopt them. Which brings me to SolrJ, great API if it would stay generic and have less concern for adding method each time some custom collections and query support for morelikethis or collapseddocs needs to be added. I personally find it silly that we customize SolrJ for all these request handlers anyway. You get a decent navigable data structure back from general SolrJ query requests as it is, there's no need to build in all these convenience methods specific to all the Solr componetry. Sure, it's convenient, but it's a maintenance headache and as you say, not generic. But hacking configuration is reasonable, I think, for adding in plugins. I guess you're aiming for some kind of Spring-like auto-discovery of plugins? Yeah, maybe, but I'm pretty -1 on Spring coming into Solr. It's overkill and ugly, IMO. But you like it :) And that's cool by me, to each their own. Oh, and Hi Mark! :) Erik -- Met vriendelijke groet, Martijn van Groningen
Re: question about the fieldCollapseCache
The fieldCollapseCache should not be used as it is now, it uses too much memory. It stores any information relevant for a field collapse search. Like document collapse counts, collapsed document ids / fields, collapsed docset and uncollapsed docset (everything per unique search). So the memory usage will grow for each unique query (and fast with all this information). So its best I think to disable this cache for now. Martijn On 8 June 2010 19:05, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: Hi All, I've been running some tests using 6 shards each one containing about 1 millions documents. Each shard is running in its own virtual machine with 7 GB of ram (5GB allocated to the JVM). After about 1100 unique queries the shards start to struggle and run out of memory. I've reduced all other caches without significant impact. When I remove completely the fieldCollapseCache, the server can keep up for hours and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM) The size of the fieldCollapseCache was set to 5000 items. How can 5000 items eat 3 GB of ram? Can someone tell me what is put in this cache? Has anyone experienced this kind of problem? I am running Solr 1.4.1 with patch 236. All requests are collapsing on a single field (pint) and collapse.maxdocs set to 200 000. Thanks for any hints...
Re: question about the fieldCollapseCache
I agree. I'll add this information to the wiki. On 9 June 2010 14:32, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: ok great. I believe this should be mentioned in the wiki. Later On 2010-06-09, at 4:06 AM, Martijn v Groningen wrote: The fieldCollapseCache should not be used as it is now, it uses too much memory. It stores any information relevant for a field collapse search. Like document collapse counts, collapsed document ids / fields, collapsed docset and uncollapsed docset (everything per unique search). So the memory usage will grow for each unique query (and fast with all this information). So its best I think to disable this cache for now. Martijn On 8 June 2010 19:05, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: Hi All, I've been running some tests using 6 shards each one containing about 1 millions documents. Each shard is running in its own virtual machine with 7 GB of ram (5GB allocated to the JVM). After about 1100 unique queries the shards start to struggle and run out of memory. I've reduced all other caches without significant impact. When I remove completely the fieldCollapseCache, the server can keep up for hours and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM) The size of the fieldCollapseCache was set to 5000 items. How can 5000 items eat 3 GB of ram? Can someone tell me what is put in this cache? Has anyone experienced this kind of problem? I am running Solr 1.4.1 with patch 236. All requests are collapsing on a single field (pint) and collapse.maxdocs set to 200 000. Thanks for any hints... -- Met vriendelijke groet, Martijn van Groningen
Re: Applying collapse patch
The trunk should work with the latest patch (SOLR-236-trunk.patch). Did patching go successful? What compilation errors you get? On 28 May 2010 11:10, Sophie M. sop...@beezik.com wrote: Ok I will have a look on the comments and I will post if necessary. Thanks ^^ -- View this message in context: http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851170.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: Applying collapse patch
Have you executed: ant example after building? (Assuming that this is the example solr) On 28 May 2010 12:17, Sophie M. sop...@beezik.com wrote: It is ok for applying the patch, thanks Martin. When I start Solr I get this logs in my console : C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solrjava -jar start.jar 2010-05-28 12:09:30.037:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2010-05-28 12:09:30.178:INFO::jetty-6.1.22 2010-05-28 12:09:30.208:INFO::Opened C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solr\logs\2010_05_28.request.log 2010-05-28 12:09:30.218:INFO::Started socketconnec...@0.0.0.0:8983 and I have a 404 error on http://localhost:8983/solr I am investigating ^^ regards -- View this message in context: http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851308.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: Field Collapsing SOLR-236
Hi Blargy, The latest path is not compatible with 1.4, I believe that the latest field-collapse-5.patch file is compatible with 1.4. The file should at least compile with 1.4 trunk. I'm not sure how the performance is. Martijn On 25 March 2010 01:49, Dennis Gearon gear...@sbcglobal.net wrote: Boy, I hope that field collapsing works! I'm planning on using it heavily. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 3/24/10, blargy zman...@hotmail.com wrote: From: blargy zman...@hotmail.com Subject: Field Collapsing SOLR-236 To: solr-user@lucene.apache.org Date: Wednesday, March 24, 2010, 12:17 PM Has anyone had any luck with the field collapsing patch (SOLR-236) with Solr 1.4? I tried patching my version of 1.4 with no such luck. Thanks -- View this message in context: http://old.nabble.com/Field-Collapsing-SOLR-236-tp28019949p28019949.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: Dedupe of document results at query-time
This manner of detecting duplicates at query time does really match with what field collapsing does. So I suggest you look into that. As far as I know there isn't any function query that does something you have described in your example. Cheers, Martijn On 23 January 2010 12:31, Peter S pete...@hotmail.com wrote: Hi, I wonder if someone might be able to shed some insight into this problem: Is it possible and/or what is the best/accepted way to achieve deduplication of documents by field at query-time? For example: Let's say an index contains: Doc1 host:Host1 time:1 Sept 09 appname:activePDF Doc2 host:Host1 time:2 Sept 09 appname:activePDF Doc3 host:Host1 time:3 Sept 09 appname:activePDF Can a query be constructed that would return only 1 of these Documents based on appname (doesn't really matter which one)? i.e.: match on host:Host1 ignore time dedupe on appname:activePDF Is this possible? Would FunctionQuery be helpful here, maybe? Am I actually talking about field collapsing? Many thanks, Peter _ Got a cool Hotmail story? Tell us now http://clk.atdmt.com/UKM/go/195013117/direct/01/ -- Met vriendelijke groet, Martijn van Groningen
Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?
I wouldn't use the patches of the sub issues right now as they are under development right now (the are currently a POC). I also think that the latest patch in SOLR-236 is currently the best option. There are some memory related problems with the patch that have to do with caching. The fieldCollapse cache requires a lot of memory (best is not to use it right now). The filterCache also becomes quite large as well. Depending on the size your corpus you would need to increase your heap size and play around with that. Martijn 2010/1/12 Joe Calderon calderon@gmail.com: it seems to be in flux right now as the solr developers slowly make improvements and ingest the various pieces into the solr trunk, i think your best bet might be to use the 12/24 patch and fix any errors where it doesnt apply cleanly im using solr trunk r892336 with the 12/24 patch --joe On 01/11/2010 08:48 PM, Kelly Taylor wrote: Hi, Is there a step-by-step for applying the patch for SOLR-236 to enable field collapsing in Solr 1.4? Thanks, Kelly -- Met vriendelijke groet, Martijn van Groningen
Re: Field Collapsing - disable cache
Latest SOLR-236.patch is for the trunk, if have updated the latest patch so it should patch now without conflicts. If I remember correctly the latest field-collapse-5.patch should work for 1.4, but it doesn't for the trunk. 2009/12/23 r...@intelcompute.com: Sorry, that was when trying to patch the 1.4 branch attempting to patch the trunk gives... patching file src/test/test-files/fieldcollapse/testResponse.xml patching file src/test/org/apache/solr/BaseDistributedSearchTestCase.java Hunk #2 FAILED at 502. 1 out of 2 hunks FAILED -- saving rejects to file src/test/org/apache/solr/BaseDistributedSearchTestCase.java.rej patching file src/test/org/apache/solr/search/fieldcollapse/FieldCollapsingIntegrationTest.java btw, when is trunk actually updated? On Wed 23/12/09 11:53 , r...@intelcompute.com wrote: I'm currently trying to patch aginst trunk, using SOLR-236.patch from 18/12/2009 but getting the following error... [ solr]$ patch -p0 SOLR-236.patch patching file src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml patching file src/test/test-files/solr/conf/schema-fieldcollapse.xml patching file src/test/test-files/solr/conf/solrconfig.xml patching file src/test/test-files/fieldcollapse/testResponse.xml can't find file to patch at input line 787 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- | |Property changes on: src/test/test-files/fieldcollapse/testResponse.xml |___ |Added: svn:keywords | + Date Author Id Revision HeadURL |Added: svn:eol-style | + native | |Index: src/test/org/apache/solr/BaseDistributedSearchTestCase.java |=== |--- src/test/org/apache/solr/BaseDistributedSearchTestCase.java (revision 891214) |+++ src/test/org/apache/solr/BaseDistributedSearchTestCase.java (working copy) -- File to patch: any suggestions, or should i checkout the 1.4 branch instead? can't remember what i did last time to get field-collapse-5.patch working successfully. On Tue 22/12/09 22:43 , Lance Norskog wrote: To avoid this possible bug, you could change the cache to only have a few entries. On Tue, Dec 22, 2009 at 6:34 AM, Martijn v Groningen wrote: In the latest patch some changes where made on the configuration side, but if you add the CollapseComponent to the conf no field collapse cache should be enabled. If not let me know. Martijn 2009/12/22 : On Tue 22/12/09 12:28 , Martijn v Groningen wrote: Hi Rob, What patch are you actually using from SOLR-236? Martijn 2009/12/22 : I've tried both, the whole fieldCollapsing tag, and just the fieldCollapseCache tag inside it. both cause error. I guess I can just set size, initialSize, and autowarmCount to 0 ?? On Tue 22/12/09 11:17 , Toby Cole wrote:Which elements did you comment out? It could be the case that you need to get rid of the entire fieldCollapsing element, not just the fieldCollapsingCache element. (Disclaimer: I've not used field collapsing in anger before :) Toby. On 22 Dec 2009, at 11:09, wrote: That's what I assumed, but I'm getting the following error with it commented out MESSAGE null java.lang.NullPointerException at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .createDocumentCollapseResult(AbstractDocumentCollapser.java:276) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .executeCollapse(AbstractDocumentCollapser.java:249) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java: 172) at org .apache .solr .handler .component.CollapseComponent.doProcess(CollapseComponent.java:173) at org .apache .solr .handler.component.CollapseComponent.process(CollapseComponent.java: 127) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 215) at org .apache .catalina .core.ApplicationFilterChain.doFilter
Re: Field Collapsing - disable cache
How have you configured field collapsing? I have field collapsing configured without caching like this: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent / and with caching like this: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent fieldCollapseCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=128/ /searchComponent In both situations I don't get NPE. Also the line (in the patch) where the exceptions occurs seems unlikely for a NPE. if (fieldCollapseCache != null) { fieldCollapseCache.put(currentCacheKey, new CacheValue(result, collectors, collapseContext)); // line 276 } Does the exception occur immediately after the first search? Martijn 2009/12/23 r...@intelcompute.com: Still seeing the same error when trying to comment out the fieldCollapsing block, or even just the fieldCollapseCache block inside it to disable the cache. fresh trunk and latest patch. message null java.lang.NullPointerException at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.createDocumentCollapseResult(AbstractDocumentCollapser.java:276) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:249) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:172) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:173) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:636) On Wed 23/12/09 13:26 , r...@intelcompute.com wrote: Thanks, that latest update to the patch works fine now. On Wed 23/12/09 13:13 , Martijn v Groningen wrote: Latest SOLR-236.patch is for the trunk, if have updated the latest patch so it should patch now without conflicts. If I remember correctly the latest field-collapse-5.patch should work for 1.4, but it doesn't for the trunk. 2009/12/23 : Sorry, that was when trying to patch the 1.4 branch attempting to patch the trunk gives... patching file src/test/test-files/fieldcollapse/testResponse.xml patching file src/test/org/apache/solr/BaseDistributedSearchTestCase.java Hunk #2 FAILED at 502. 1 out of 2 hunks FAILED -- saving rejects to file src/test/org/apache/solr/BaseDistributedSearchTestCase.java.rej patching file src/test/org/apache/solr/search/fieldcollapse/FieldCollapsingIntegrationTes t.java btw, when is trunk actually updated? On Wed 23/12/09 11:53 , wrote: I'm currently trying to patch aginst trunk, using SOLR-236.patch from 18/12/2009 but getting the following error... [ solr]$ patch -p0 SOLR-236.patch patching file src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml patching file src/test/test-files/solr/conf/schema-fieldcollapse.xml patching file src/test/test-files/solr/conf/solrconfig.xml patching file src/test/test-files/fieldcollapse/testResponse.xml can't find file to patch at input line 787 Perhaps you used the wrong -p or --strip option? The text leading up
Re: Field Collapsing - disable cache
Hi Rob, What patch are you actually using from SOLR-236? Martijn 2009/12/22 r...@intelcompute.com: I've tried both, the whole fieldCollapsing tag, and just the fieldCollapseCache tag inside it. both cause error. I guess I can just set size, initialSize, and autowarmCount to 0 ?? On Tue 22/12/09 11:17 , Toby Cole wrote:Which elements did you comment out? It could be the case that you need to get rid of the entire fieldCollapsing element, not just the fieldCollapsingCache element. (Disclaimer: I've not used field collapsing in anger before :) Toby. On 22 Dec 2009, at 11:09, wrote: That's what I assumed, but I'm getting the following error with it commented out MESSAGE null java.lang.NullPointerException at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .createDocumentCollapseResult(AbstractDocumentCollapser.java:276) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .executeCollapse(AbstractDocumentCollapser.java:249) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java: 172) at org .apache .solr .handler .component.CollapseComponent.doProcess(CollapseComponent.java:173) at org .apache .solr .handler.component.CollapseComponent.process(CollapseComponent.java: 127) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 215) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 210) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 172) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 108) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 151) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 870) at org.apache.coyote.http11.Http11BaseProtocol $Http11ConnectionHandler.processConnection(Http11BaseProtocol.java: 665) at org .apache .tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java: 528) at org .apache .tomcat .util .net .LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool $ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:636) On Tue 22/12/09 11:02 , Toby Cole wrote:If you take out the fieldCollapsing/fieldCollapseCache element in your config the fieldcollapse component will not use a cache. From http://wiki.apache.org/solr/FieldCollapsing%2523line-63 [2] target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63 [1] target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63 [3] target=_blankhttp://wiki.apache.org/solr/FieldCollapsing#line-63 If the field collapse cache is not configured then the field collapse logic will not be cached. Regards, Toby. On 22 Dec 2009, at 10:56, wrote: my solconfig can be seen at http://www.intelcompute.com/solrconfig.xml [4] target=_blankhttp://www.intelcompute.com/solrconfig.xml [3] target=_blankhttp://www.intelcompute.com/solrconfig.xml [5] target=_blankhttp://www.intelcompute.com/solrconfig.xml [1] On Tue 22/12/09 10:51 , wrote:Is it possible to disable the field collapsing cache? I'm trying to perform some speed tests, and have managed to comment out the filter, queryResult, and document caches successfully. on 1.5 ... ... collapse facet tvComponent ... - Message sent via Atmail Open - http://atmail.org/ [6] target=_blankhttp://atmail.org/ [5] target=_blankhttp://atmail.org/ [7] target=_blankhttp://atmail.org/ [2] target=_blankhttp://atmail.org/ [8] target=_blankhttp://atmail.org/ [6] target=_blankhttp://atmail.org/ [9] target=_blankhttp://atmail.org/ - Message sent via Atmail Open - http://atmail.org/ [10] target=_blankhttp://atmail.org/ [7] target=_blankhttp://atmail.org/ [11] target=_blankhttp://atmail.org/ Links: -- [1] http://www.intelcompute.com/solrconfig.xml [12] target=_blankhttp://www.intelcompute.com/solrconfig.xml [8] target=_blankhttp://www.intelcompute.com/solrconfig.xml [13]
Re: Field Collapsing - disable cache
In the latest patch some changes where made on the configuration side, but if you add the CollapseComponent to the conf no field collapse cache should be enabled. If not let me know. Martijn 2009/12/22 r...@intelcompute.com: On Tue 22/12/09 12:28 , Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Rob, What patch are you actually using from SOLR-236? Martijn 2009/12/22 : I've tried both, the whole fieldCollapsing tag, and just the fieldCollapseCache tag inside it. both cause error. I guess I can just set size, initialSize, and autowarmCount to 0 ?? On Tue 22/12/09 11:17 , Toby Cole wrote:Which elements did you comment out? It could be the case that you need to get rid of the entire fieldCollapsing element, not just the fieldCollapsingCache element. (Disclaimer: I've not used field collapsing in anger before :) Toby. On 22 Dec 2009, at 11:09, wrote: That's what I assumed, but I'm getting the following error with it commented out MESSAGE null java.lang.NullPointerException at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .createDocumentCollapseResult(AbstractDocumentCollapser.java:276) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser .executeCollapse(AbstractDocumentCollapser.java:249) at org .apache .solr .search .fieldcollapse .AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java: 172) at org .apache .solr .handler .component.CollapseComponent.doProcess(CollapseComponent.java:173) at org .apache .solr .handler.component.CollapseComponent.process(CollapseComponent.java: 127) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 215) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 210) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 172) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 108) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 151) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 870) at org.apache.coyote.http11.Http11BaseProtocol $Http11ConnectionHandler.processConnection(Http11BaseProtocol.java: 665) at org .apache .tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java: 528) at org .apache .tomcat .util .net .LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool $ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:636) On Tue 22/12/09 11:02 , Toby Cole wrote:If you take out the fieldCollapsing/fieldCollapseCache element in your config the fieldcollapse component will not use a cache. From http://wiki.apache.org/solr/FieldCollapsing%2523line-63 [2] target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63 [1] target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63 [3] target=_blankhttp://wiki.apache.org/solr/FieldCollapsing#line-63 If the field collapse cache is not configured then the field collapse logic will not be cached. Regards, Toby. On 22 Dec 2009, at 10:56, wrote: my solconfig can be seen at http://www.intelcompute.com/solrconfig.xml [4] target=_blankhttp://www.intelcompute.com/solrconfig.xml [3] target=_blankhttp://www.intelcompute.com/solrconfig.xml [5] target=_blankhttp://www.intelcompute.com/solrconfig.xml [1] On Tue 22/12/09 10:51 , wrote:Is it possible to disable the field collapsing cache? I'm trying to perform some speed tests, and have managed to comment out the filter, queryResult, and document caches successfully. on 1.5 ... ... collapse facet tvComponent ... - Message sent via Atmail Open - http://atmail.org/ [6] target=_blankhttp://atmail.org/ [5] target=_blankhttp://atmail.org/ [7] target=_blankhttp://atmail.org/ [2] target=_blankhttp://atmail.org
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
I would not expect that Solr 1.4 build is the cause of the problem. Just out of curiosity does the same happen when collapse.threshold=1? 2009/12/11 Varun Gupta varun.vgu...@gmail.com: Here is the field type configuration of ctype: field name=ctype type=integer indexed=true stored=true omitNorms=true / In solrconfig.xml, this is how I am enabling field collapsing: searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent/ Apart from this, I made no changes in solrconfig.xml for field collapse. I am currently not using the field collapse cache. I have applied the patch on the Solr 1.4 build. I am not using the latest solr nightly build. Can that cause any problem? -- Thanks Varun Gupta On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: I tried to reproduce a similar situation here, but I got the expected and correct results. Those three documents that you saw in your first search result should be the first in your second search result (unless the index changes or the sort changes ) when fq on that specific category. I'm not sure what is causing this problem. Can you give me some more information like the field type configuration for the ctype field and how have configured field collapsing? I did find another problem to do with field collapse caching. The collapse.threshold or collapse.maxdocs parameters are not taken into account when caching, which is off course wrong because they do matter when collapsing. Based on the information you have given me this caching problem is not the cause of the situation you have. I will update the patch that fixes this problem shortly. Martijn 2009/12/10 Varun Gupta varun.vgu...@gmail.com: Hi Martijn, I am not sending the collapse parameters for the second query. Here are the queries I am using: *When using field collapsing (searching over all categories):* spellcheck=truecollapse.info.doc=truefacet=truecollapse.threshold=3facet.mincount=1spellcheck.q=weight+losscollapse.facet=beforewt=xmlf.content.hl.snippets=2hl=trueversion=2.2rows=20collapse.field=ctypefl=id,sid,title,image,ctype,scorestart=0q=weight+losscollapse.info.count=falsefacet.field=ctypeqt=contentsearch categories is represented as the field ctype above. *Without using field collapsing:* spellcheck=truefacet=truefacet.mincount=1spellcheck.q=weight+losswt=xmlhl=truerows=10version=2.2fl=id,sid,title,image,ctype,scorestart=0q=weight+lossfacet.field=ctypeqt=contentsearch I append fq=ctype:1 to the above queries when trying to get results for a particular category. -- Thanks Varun Gupta On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Varun, Can you send the whole requests (with params), that you send to Solr for both queries? In your situation the collapse parameters only have to be used for the first query and not the second query. Martijn 2009/12/10 Varun Gupta varun.vgu...@gmail.com: Hi, I have documents under 6 different categories. While searching, I want to show 3 documents from each category along with a link to see all the documents under a single category. I decided to use field collapsing so that I don't have to make 6 queries (one for each category). Currently I am using the field collapsing patch uploaded on 29th Nov. Now, the results that are coming after using field collapsing are not matching the results for a single category. For example, for category C1, I am getting results R1, R2 and R3 using field collapsing, but after I see results only from the category C1 (without using field collapsing) these results are nowhere in the first 10 results. Am I doing something wrong or using the field collapsing for the wrong feature? I am using the following field collapsing parameters while querying: collapse.field=category collapse.facet=before collapse.threshold=3 -- Thanks Varun Gupta -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
Hi Varun, Can you send the whole requests (with params), that you send to Solr for both queries? In your situation the collapse parameters only have to be used for the first query and not the second query. Martijn 2009/12/10 Varun Gupta varun.vgu...@gmail.com: Hi, I have documents under 6 different categories. While searching, I want to show 3 documents from each category along with a link to see all the documents under a single category. I decided to use field collapsing so that I don't have to make 6 queries (one for each category). Currently I am using the field collapsing patch uploaded on 29th Nov. Now, the results that are coming after using field collapsing are not matching the results for a single category. For example, for category C1, I am getting results R1, R2 and R3 using field collapsing, but after I see results only from the category C1 (without using field collapsing) these results are nowhere in the first 10 results. Am I doing something wrong or using the field collapsing for the wrong feature? I am using the following field collapsing parameters while querying: collapse.field=category collapse.facet=before collapse.threshold=3 -- Thanks Varun Gupta -- Met vriendelijke groet, Martijn van Groningen
Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
I tried to reproduce a similar situation here, but I got the expected and correct results. Those three documents that you saw in your first search result should be the first in your second search result (unless the index changes or the sort changes ) when fq on that specific category. I'm not sure what is causing this problem. Can you give me some more information like the field type configuration for the ctype field and how have configured field collapsing? I did find another problem to do with field collapse caching. The collapse.threshold or collapse.maxdocs parameters are not taken into account when caching, which is off course wrong because they do matter when collapsing. Based on the information you have given me this caching problem is not the cause of the situation you have. I will update the patch that fixes this problem shortly. Martijn 2009/12/10 Varun Gupta varun.vgu...@gmail.com: Hi Martijn, I am not sending the collapse parameters for the second query. Here are the queries I am using: *When using field collapsing (searching over all categories):* spellcheck=truecollapse.info.doc=truefacet=truecollapse.threshold=3facet.mincount=1spellcheck.q=weight+losscollapse.facet=beforewt=xmlf.content.hl.snippets=2hl=trueversion=2.2rows=20collapse.field=ctypefl=id,sid,title,image,ctype,scorestart=0q=weight+losscollapse.info.count=falsefacet.field=ctypeqt=contentsearch categories is represented as the field ctype above. *Without using field collapsing:* spellcheck=truefacet=truefacet.mincount=1spellcheck.q=weight+losswt=xmlhl=truerows=10version=2.2fl=id,sid,title,image,ctype,scorestart=0q=weight+lossfacet.field=ctypeqt=contentsearch I append fq=ctype:1 to the above queries when trying to get results for a particular category. -- Thanks Varun Gupta On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Varun, Can you send the whole requests (with params), that you send to Solr for both queries? In your situation the collapse parameters only have to be used for the first query and not the second query. Martijn 2009/12/10 Varun Gupta varun.vgu...@gmail.com: Hi, I have documents under 6 different categories. While searching, I want to show 3 documents from each category along with a link to see all the documents under a single category. I decided to use field collapsing so that I don't have to make 6 queries (one for each category). Currently I am using the field collapsing patch uploaded on 29th Nov. Now, the results that are coming after using field collapsing are not matching the results for a single category. For example, for category C1, I am getting results R1, R2 and R3 using field collapsing, but after I see results only from the category C1 (without using field collapsing) these results are nowhere in the first 10 results. Am I doing something wrong or using the field collapsing for the wrong feature? I am using the following field collapsing parameters while querying: collapse.field=category collapse.facet=before collapse.threshold=3 -- Thanks Varun Gupta -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Grouping
Field collapsing has some aggregation functions like sum() and avg(), but the statistics are computed based on collapse groups instead of all documents with the same field value. A collapse group contains documents that were not relevant enough to end up (collapsed documents) in the search result and one or more documents that are relevant for the current search result, that are being displayed in the search result. This number is controlled by the collapse.threshold parameter, that defaults to one. The statistics are calculated based on the collapsed documents, so it is not exactly the same as a sql group by. You can however get similar results when the collapse.threshold is one and add the field value (e.g. price) of the most relevant document to the aggregated statistic. Off course you will have to this yourself on the client side. Hope this clarifies the field collapse functionality a bit. Martijn 2009/12/4 Otis Gospodnetic otis_gospodne...@yahoo.com: Not out of the box. You could group by using SOLR-236 perhaps? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Bruno brun...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, December 4, 2009 1:08:59 PM Subject: Grouping Is there a way to make a group by or distinct query? -- Bruno Morelli Vargas Mail: brun...@gmail.com Msn: brun...@hotmail.com Icq: 165055101 Skype: morellibmv
Re: Deduplication in 1.4
Field collapsing has been used by many in their production environment. The last few months the stability of the patch grew as quiet some bugs were fixed. The only big feature missing currently is caching of the collapsing algorithm. I'm currently working on that and I will put it in a new patch in the coming next days. So yes the patch is very near being production ready. Martijn 2009/11/26 KaktuChakarabati jimmoe...@gmail.com: Hey Otis, Yep, I realized this myself after playing some with the dedupe feature yesterday. So it does look like Field collapsing is what I need pretty much. Any idea on how close it is to being production-ready? Thanks, -Chak Otis Gospodnetic wrote: Hi, As far as I know, the point of deduplication in Solr ( http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate document before indexing it in order to avoid duplicates in the index in the first place. What you are describing is closer to field collapsing patch in SOLR-236. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: KaktuChakarabati jimmoe...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, November 24, 2009 5:29:00 PM Subject: Deduplication in 1.4 Hey, I've been trying to find some documentation on using this feature in 1.4 but Wiki page is alittle sparse.. In specific, here's what i'm trying to do: I have a field, say 'duplicate_group_id' that i'll populate based on some offline documents deduplication process I have. All I want is for solr to compute a 'duplicate_signature' field based on this one at update time, so that when i search for documents later, all documents with same original 'duplicate_group_id' value will be rolled up (e.g i'll just get the first one that came back according to relevancy). I enabled the deduplication processor and put it into updater, but i'm not seeing any difference in returned results (i.e results with same duplicate_id are returned separately..) is there anything i need to supply in query-time for this to take effect? what should be the behaviour? is there any working example of this? Anything will be helpful.. Thanks, Chak -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication in 1.4
Two sites that use field-collapsing: 1) www.ilocal.nl 2) www.welke.nl I'm not sure what you mean with double-tripping? The sites mentioned do not have performance problems that are caused by field collapsing. Field-collapsing currently only supports quasi distributed field-collapsing (as I have described on the Solr wiki). Currently I don't know a distributed field-collapsing algorithm that works properly and does not influence the search time in such a way that the search becomes slow. Martijn 2009/11/26 Otis Gospodnetic otis_gospodne...@yahoo.com: Hi Martijn, - Original Message From: Martijn v Groningen martijn.is.h...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, November 26, 2009 3:19:40 AM Subject: Re: Deduplication in 1.4 Field collapsing has been used by many in their production environment. Got any pointers to public sites you know use it? I know of a high traffic site that used an early version, and it caused performance problems. Is double-tripping still required? The last few months the stability of the patch grew as quiet some bugs were fixed. The only big feature missing currently is caching of the collapsing algorithm. I'm currently working on that and Is it also full distributed-search-ready? I will put it in a new patch in the coming next days. So yes the patch is very near being production ready. Thanks, Otis Martijn 2009/11/26 KaktuChakarabati : Hey Otis, Yep, I realized this myself after playing some with the dedupe feature yesterday. So it does look like Field collapsing is what I need pretty much. Any idea on how close it is to being production-ready? Thanks, -Chak Otis Gospodnetic wrote: Hi, As far as I know, the point of deduplication in Solr ( http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate document before indexing it in order to avoid duplicates in the index in the first place. What you are describing is closer to field collapsing patch in SOLR-236. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: KaktuChakarabati To: solr-user@lucene.apache.org Sent: Tue, November 24, 2009 5:29:00 PM Subject: Deduplication in 1.4 Hey, I've been trying to find some documentation on using this feature in 1.4 but Wiki page is alittle sparse.. In specific, here's what i'm trying to do: I have a field, say 'duplicate_group_id' that i'll populate based on some offline documents deduplication process I have. All I want is for solr to compute a 'duplicate_signature' field based on this one at update time, so that when i search for documents later, all documents with same original 'duplicate_group_id' value will be rolled up (e.g i'll just get the first one that came back according to relevancy). I enabled the deduplication processor and put it into updater, but i'm not seeing any difference in returned results (i.e results with same duplicate_id are returned separately..) is there anything i need to supply in query-time for this to take effect? what should be the behaviour? is there any working example of this? Anything will be helpful.. Thanks, Chak -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about collapse.type = adjacent
Hi Micheal, Field collapsing is basicly done in two steps. The first step is to get the uncollapsed sorted (whether it is score or a field value) documents and the second step is to apply the collapse algorithm on the uncollapsed documents. So yes, when specifying collapse.type=adjacent the documents can get collapsed after the sort has been applied, but this also the case when not specifying collapse.type=adjacent I hope this answers your question. Cheers, Martijn 2009/11/2 michael8 mich...@saracatech.com: Hi, I would like to confirm if 'adjacent' in collapse.type means the documents (with the same collapse field value) are considered adjacent *after* the 'sort' param from the query has been applied, or *before*? I would think it would be *after* since collapse feature primarily is meant for presentation use. Thanks, Michael -- View this message in context: http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: weird problem with letters S and T
I think that is not a problem, because your are only storing one character per field. There are other text field types that do not have the stop word filter, so give your first letter field that field type. In this way stopword filter analyser is only disabled for searches on the first letter field. Cheers, Martijn 2009/10/28 Joel Nylund jnyl...@yahoo.com: Thanks Bern, now that you mention it they are in there, I assume if I remove them it will work, but I probably dont want to do that right? Is there a way for this particular query to ignore stopwords thanks Joel On Oct 28, 2009, at 6:20 PM, Bernadette Houghton wrote: Hi Joel, I had a similar issue the other day; in my case the solution turned out to be that the letters were stopwords. Don't know if this is your answer, but worth checking. Bern -Original Message- From: Joel Nylund [mailto:jnyl...@yahoo.com] Sent: Thursday, 29 October 2009 9:17 AM To: solr-user@lucene.apache.org Subject: weird problem with letters S and T (I am super new to solr, sorry if this is an easy one) Hi, I want to support an A-Z type view of my data. I have a DataImportHandler that uses sql (my query is complex, but the part that matters is: SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f I can create this index with no issues. I can query the title with no problem: http://localhost:8983/solr/select?q=title:super I can query the first letters mostly with no problem: http://localhost:8983/solr/select?q=firstLetterTitle:a Returns all the foo's with the first letter a. This actually works with every letter except S and T If I query those, I get no results. The weird thing if I do the title query above with Super I get lots of results, and the xml shoes the firstLetterTitles for those to be S doc str name=firstLetterTitleS/str str name=id84861348/str str name=titleSuper Cool/str /doc − doc str name=firstLetterTitleS/str str name=id108692/str str name=titleSuper 45/str /doc − doc etc. Any ideas, are S and T special chars in query for solr? here is the response from the s query with debug = true response − lst name=responseHeader int name=status0/int int name=QTime24/int − lst name=params str name=qfirstLetterTitle:s/str str name=debugQuerytrue/str /lst /lst result name=response numFound=0 start=0/ − lst name=debug str name=rawquerystringfirstLetterTitle:s/str str name=querystringfirstLetterTitle:s/str str name=parsedquery/ str name=parsedquery_toString/ lst name=explain/ str name=QParserOldLuceneQParser/str − lst name=timing double name=time2.0/double − lst name=prepare double name=time1.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst − lst name=process double name=time0.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response thanks Joel
Re: field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)
Hi Joe, Can you give a bit more context info? Like the exact search and the field types you are using for example. Also are you doing a lot of frequent updates to the index? Cheers, Martijn 2009/10/23 Joe Calderon calderon@gmail.com: seems to happen when sort on anything besides strictly score, even score desc, num desc triggers it, using latest nightly and 10/14 patch Problem accessing /solr/core1/select. Reason: 4731592 java.lang.ArrayIndexOutOfBoundsException: 4731592 at org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660) at org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentComparator.compare(NonAdjacentDocumentCollapser.java:235) at org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentPriorityQueue.lessThan(NonAdjacentDocumentCollapser.java:173) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158) at org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:95) at org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)
Re: field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)
I was able to reproduce the exact same stacktrace you have sent. The exception occured when I removed a document from a newly created index (with a commit) and then did a search with field collapsing enabled. I have attached a new patch to SOLR-236 that includes a fix for this bug. Martijn 2009/10/25 Martijn v Groningen martijn.is.h...@gmail.com: Hi Joe, Can you give a bit more context info? Like the exact search and the field types you are using for example. Also are you doing a lot of frequent updates to the index? Cheers, Martijn 2009/10/23 Joe Calderon calderon@gmail.com: seems to happen when sort on anything besides strictly score, even score desc, num desc triggers it, using latest nightly and 10/14 patch Problem accessing /solr/core1/select. Reason: 4731592 java.lang.ArrayIndexOutOfBoundsException: 4731592 at org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660) at org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentComparator.compare(NonAdjacentDocumentCollapser.java:235) at org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentPriorityQueue.lessThan(NonAdjacentDocumentCollapser.java:173) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158) at org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:95) at org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)
Re: Collapse with multiple fields
No this actually not supported at the moment. If you really need to collapse on two different field you can concatenate the two fields together in another field while indexing and then collapse on that field. Martijn 2009/10/23 Thijs vonk.th...@gmail.com: I haven't had time to actually ask this on the list my self but seeing this, I just had to reply. I was wondering this myself. Thijs On 23-10-2009 5:50, R. Tan wrote: Hi, Is it possible to collapse the results from multiple fields? Rih
Re: JVM OOM when using field collapse component
No I have not encountered OOM exception yet with current field collapse patch. How large is your configured JVM heap space (-Xmx)? Field collapsing requires more memory then regular searches so. Does Solr run out of memory during the first search(es) or does it run out of memory after a while when it performed quite a few field collapse searches? I see that you are also using the collapse.includeCollapsedDocs.fl parameter for your search. This feature will require more memory then a normal field collapse search. I normally give the Solr instance a heap space of 1024M when having an index of a few million. Martijn 2009/10/2 Joe Calderon calderon@gmail.com: i gotten two different out of memory errors while using the field collapsing component, using the latest patch (2009-09-26) and the latest nightly, has anyone else encountered similar problems? my collection is 5 million results but ive gotten the error collapsing as little as a few thousand SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:173) at org.apache.lucene.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:749) at org.apache.lucene.util.OpenBitSet.ensureCapacity(OpenBitSet.java:757) at org.apache.lucene.util.OpenBitSet.expandingWordNum(OpenBitSet.java:292) at org.apache.lucene.util.OpenBitSet.set(OpenBitSet.java:233) at org.apache.solr.search.AbstractDocumentCollapser.addCollapsedDoc(AbstractDocumentCollapser.java:402) at org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:115) at org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.util.DocSetScoreCollector.init(DocSetScoreCollector.java:44) at org.apache.solr.search.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:68) at org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:205) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at
Re: field collapsing sums
Well that is odd. How have you configured field collapsing with the dismax request handler? The collapse counts should X - 1 (if collapse.threshold=1). Martijn 2009/10/1 Joe Calderon calderon@gmail.com: thx for the reply, i just want the number of dupes in the query result, but it seems i dont get the correct totals, for example a non collapsed dismax query for belgian beer returns X number results but when i collapse and sum the number of docs under collapse_counts, its much less than X it does seem to work when the collapsed results fit on one page (10 rows in my case) --joe 2) It seems that you are using the parameters as was intended. The collapsed documents will contain all documents (from whole query result) that have been collapsed on a certain field value that occurs in the result set that is being displayed. That is how it should work. But if I'm understanding you correctly you want to display all dupes from the whole query result set (also those which collapse field value does not occur in the in the displayed result set)? -- Met vriendelijke groet, Martijn van Groningen
Re: field collapsing sums
Hi Joe, Currently the patch does not do that, but you can do something else that might help you in getting your summed stock. In the latest patch you can include fields of collapsed documents in the result per distinct field value. If your specify collapse.includeCollapseDocs.fl=num_in_stock in the request nd lets say you collapse on brand then in the response you will receive the following xml: lst name=collapsedDocs result name=brand1 numFound=48 start=0 doc str name=num_in_stock2/str /doc doc str name=num_in_stock3/str /doc ... /result result name=”brand2” numFound=”9” start=”0” ... /result /lst On the client side you can do whatever you want with this data and for example sum it together. Although the patch does not sum for you, I think it will allow to implement your requirement without to much hassle. Cheers, Martijn 2009/10/1 Matt Weber m...@mattweber.org: You might want to see how the stats component works with field collapsing. Thanks, Matt Weber On Sep 30, 2009, at 5:16 PM, Uri Boness wrote: Hi, At the moment I think the most appropriate place to put it is in the AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might not be the most efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe
Re: field collapsing sums
1) That is correct. Including collapsed documents fields can make you search significantly slower (depending on how many documents are returned). 2) It seems that you are using the parameters as was intended. The collapsed documents will contain all documents (from whole query result) that have been collapsed on a certain field value that occurs in the result set that is being displayed. That is how it should work. But if I'm understanding you correctly you want to display all dupes from the whole query result set (also those which collapse field value does not occur in the in the displayed result set)? Martijn 2009/10/1 Joe Calderon calderon@gmail.com: hello martijn, thx for the tip, i tried that approach but ran into two snags, 1. returning the fields makes collapsing a lot slower for results, but that might just be the nature of iterating large results. 2. it seems like only dupes of records on the first page are returned or is tehre a a setting im missing? currently im only sending, collapse.field=brand and collapse.includeCollapseDocs.fl=num_in_stock --joe On Thu, Oct 1, 2009 at 1:14 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Joe, Currently the patch does not do that, but you can do something else that might help you in getting your summed stock. In the latest patch you can include fields of collapsed documents in the result per distinct field value. If your specify collapse.includeCollapseDocs.fl=num_in_stock in the request nd lets say you collapse on brand then in the response you will receive the following xml: lst name=collapsedDocs result name=brand1 numFound=48 start=0 doc str name=num_in_stock2/str /doc doc str name=num_in_stock3/str /doc ... /result result name=”brand2” numFound=”9” start=”0” ... /result /lst On the client side you can do whatever you want with this data and for example sum it together. Although the patch does not sum for you, I think it will allow to implement your requirement without to much hassle. Cheers, Martijn 2009/10/1 Matt Weber m...@mattweber.org: You might want to see how the stats component works with field collapsing. Thanks, Matt Weber On Sep 30, 2009, at 5:16 PM, Uri Boness wrote: Hi, At the moment I think the most appropriate place to put it is in the AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might not be the most efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe -- Met vriendelijke groet, Martijn van Groningen
Re: Mapping SolrDoc to SolrInputDoc
Hi Licinio, You can use ClientUtils.toSolrInputDocument(...), that converts a SolrDocument to a SolrInputDocument. Martijn 2009/9/16 Licinio Fernández Maurelo licinio.fernan...@gmail.com: Hi there, currently i'm working on a small app which creates an Embedded Solr Server, reads all documents from one core and puts these docs into another one. The purpose of this app is to apply (small) changes on schema.xml to indexed data (offline) resulting a new index with documents updated to schema.xml changes. What i want to know is if there is an easy way to map SolrDoc to SolrInputDoc. Any help would be much appreciated -- Lici -- Met vriendelijke groet, Martijn van Groningen
Re: Monitoring split time for fq queries when filter cache is used
Hi Rahul, Yes you are understanding is correct, but it is not possible to monitor these actions separately with Solr. Martijn 2009/9/1 Rahul R rahul.s...@gmail.com: Hello, I am trying to measure the benefit that I am getting out of using the filter cache. As I understand, there are two major parts to an fq query. Please correct me if I am wrong : - doing full index queries of each of the fq params (if filter cache is used, this result will be retrieved from the cache) - set intersection of above results (Will be done again even with filter cache enabled) Is there any flag/setting that I can enable to monitor how much time the above operations take separately i.e. the querying and the set-intersection ? Regards Rahul -- Met vriendelijke groet, Martijn van Groningen
Re: Problems importing HTML content contained within XML document
Hi Venn, I think what is happening when the BODY element is being processed by xpath expressen (/document/category/BODY), is that it does not retrieve the text content from the P elements inside the body element. The expression will only retrieve text content that is directly a child of the BODY element. I do not know the xpath function(s) the data importhandler currently supports to return the text content of a node and all its child nodes. Maybe the expression /document/category/BODY/* will work. Cheers, Martijn 2009/8/19 venn hardy venn.ha...@hotmail.com: Hello, I have just started trying out SOLR to index some XML documents that I receive. I am using the SOLR 1.3 and its HttpDataSource in conjunction with the XPathEntityProcessor. I am finding the data import really useful so far, but I am having a few problems when I try and import HTML contained within one of the XML tags BODY. The data import just seems to ignore the textContent silently but it imports everything else. When I do a query through the SOLR admin interface, only the id and author fields are displayed. Any ideas what I am doing wrong? Thanks This is what my dataConfig looks like: dataConfig dataSource type=HttpDataSource / document entity name=archive pk=id url=http://localhost:9080/data/20090817070752.xml; processor=XPathEntityProcessor forEach=/document/category transformer=DateFormatTransformer stream=true dataSource=dataSource field column=id xpath=/document/category/reference / field column=textContent xpath=/document/category/BODY / field column=author xpath=/document/category/author / /entity /document /dataConfig This is how I have specified my schema fields field name=id type=string indexed=true stored=true required=true / field name=author type=string indexed=true stored=true/ field name=textContent type=text indexed=true stored=true / /fields uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField And this is what my XML document looks like: document category reference123456/reference authorAuthori name/author BODY PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut vestibulum/P /BODY /category /document _ Looking for a place to rent, share or buy this winter? Find your next place with Ninemsn property http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT