RE: Exception using distributed field-collapsing
Does it work in the non distributed case? Is the field you're grouping on stored? What is the type on the uniqueKey field? Is it stored and indexed? I've had a problem with distributed not working when the uniqueKey field was indexed but not stored. Also, in distributed searches, the uniqueKey is used to retrieve documents from shards, so if were say, a date, that may be causing the issue. Cody -Original Message- From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] Sent: Wednesday, June 20, 2012 1:54 PM To: solr-user@lucene.apache.org Subject: RE: Exception using distributed field-collapsing Hi Bryan, What is the fieldtype of the groupField? You can only group by field that is of type string as is described in the wiki: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters When you group by another field type a http 400 should be returned instead if this error. At least that what I'd expect. Martijn Martijn, The group-by field is a string. I have been unable to figure how a date comes into the picture at all, and have basically been wondering if there is some problem in the grouping code that misaligns the field values from different results in the group, so that it is not comparing like with like. Not a strong theory, just the only thing I can think of. -- Bryan
RE: Exception using distributed field-collapsing
No, I believe it was a different exception, just brainstorming. (it was a null reference iirc) Does a *:* query with no sorting work? Cody -Original Message- From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] Sent: Thursday, June 21, 2012 10:33 AM To: solr-user@lucene.apache.org Subject: RE: Exception using distributed field-collapsing Cody, Does it work in the non distributed case? Yes. Is the field you're grouping on stored? What is the type on the uniqueKey field? Is it stored and indexed? The field I'm grouping on is a string, stored and indexed. The unique key field is a string, stored and indexed. I've had a problem with distributed not working when the uniqueKey field was indexed but not stored. Was it the same exception I'm seeing? -- Bryan -Original Message- From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] Sent: Wednesday, June 20, 2012 1:54 PM To: solr-user@lucene.apache.org Subject: RE: Exception using distributed field-collapsing Hi Bryan, What is the fieldtype of the groupField? You can only group by field that is of type string as is described in the wiki: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters When you group by another field type a http 400 should be returned instead if this error. At least that what I'd expect. Martijn Martijn, The group-by field is a string. I have been unable to figure how a date comes into the picture at all, and have basically been wondering if there is some problem in the grouping code that misaligns the field values from different results in the group, so that it is not comparing like with like. Not a strong theory, just the only thing I can think of. -- Bryan
Indexing Polygons
Hi All, I'm trying to figure out how to index polygons in solr (trunk). I'm using LSP right now as the solr integration of the new spatial module hasn't completed. I have searching for a point using a polygon working, but I'm also looking for searching for a polygon using a point. I've seen some indication that LSP supports this but I haven't been able to find an example. What field type would I need to use? Would it be multivalued? Please and thank you! Cody
RE: Grouping ngroups count
Hello, When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed query? If you're doing a distributed query, then for group.ngroups to work you need to ensure that all documents for a group exist on a single shard. However, what you're describing sounds an awful lot like this JIRA issue that I entered a while ago for distributed grouping. I found that the hit count was coming only from the shards that ended up having results in the documents that were returned. I didn't test group.ngroups at the time. https://issues.apache.org/jira/browse/SOLR-3316 If this is a similar issue then you should make a new Jira issue. Cody -Original Message- From: Francois Perron [mailto:francois.per...@wantedanalytics.com] Sent: Tuesday, May 01, 2012 6:47 AM To: solr-user@lucene.apache.org Subject: Grouping ngroups count Hello all, I tried to use grouping with 2 slices with a index of 35K documents. When I ask top 10 rows, grouped by filed A, it gave me about 16K groups. But, if I ask for top 20K rows, the ngroups property is now at 30K. Do you know why and of course how to fix it ? Thanks.
RE: To truncate or not to truncate (group.truncate vs. facet)
You may be able to fake your price requirements by rounding at index time. For instance, if you wanted 10-19$, 20-29$, 30+ then you create a second price field specifically for faceting, round down to 10, 20, 30 at index time and then facet on that field. Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Monday, April 09, 2012 7:26 PM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) Is this planned as a future feature? Is it in the bug tracker as a feature yet..just wondering how long until it is a feature. I could live without price counts for a bit. Sent from my phone - Reply message - From: Martijn v Groningen-2 [via Lucene] ml-node+s472066n3897768...@n3.nabble.com Date: Mon, Apr 9, 2012 3:31 pm Subject: To truncate or not to truncate (group.truncate vs. facet) To: danjfoley d...@micamedia.com The group.facet option only works for field facets (facet.field). Others facets types (query, range and pivot) aren't supported yet. The group.facet works for both single and multivalued fields specified in the facet.field parameter. Martijn On 9 April 2012 20:58, danjfoley d...@micamedia.com wrote: I am using group.facet and it works fine for regular facet.field but not for facet.query Sent from my phone - Reply message - From: Young, Cody [via Lucene] ml-node+s472066n3897487...@n3.nabble.com Date: Mon, Apr 9, 2012 1:38 pm Subject: To truncate or not to truncate (group.truncate vs. facet) To: danjfoley d...@micamedia.com One other thing, I believe that you need to be using facet.field on single valued string fields for group.facet to function properly. Are the fields you're faceting on multiValued=false? Cody -Original Message- From: Young, Cody [mailto:cody.yo...@move.com] Sent: Monday, April 09, 2012 10:36 AM To: solr-user@lucene.apache.org Subject: RE: To truncate or not to truncate (group.truncate vs. facet) You tried adding the parameter group.facet=true ? Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Monday, April 09, 2012 10:09 AM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I did get this working with version 4. However my facet queries still don't group. Sent from my phone - Reply message - From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com Date: Mon, Apr 9, 2012 12:45 pm Subject: To truncate or not to truncate (group.truncate vs. facet) To: danjfoley d...@micamedia.com I believe you're looking for what's called, Matrix Counts Please see this JIRA issue. To my knowledge it has been committed in trunk but not 3.x. https://issues.apache.org/jira/browse/SOLR-2898 This feature is accessed by using group.facet=true Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Saturday, April 07, 2012 7:02 PM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I've been searching for a solution to my issue, and this seems to come closest to it. But not exactly. I am indexing clothing. Each article of clothing comes in many sizes and colors, and can belong to any number of categories. For example take the following: I add 6 documents to solr as follows: product, color, size, category shirt A, red, small, valentines day shirt A, red, large, valentines day shirt A, blue, small, valentines day shirt A, blue, large, valentines day shirt A, green, small, valentines day shirt A, green, large, valentines day I'd like my facet counts to return as follows: color red (1) blue (1) green (1) size small (1) large (1) category valentines day (1) But they come back like this: color: red (2) blue (2) green (2) size: small (2) large (2) category valentines day (6) I see the group.facet parameter in version 4.0 does exactly this. However how can I make this happen now? There are all sorts of ecommerce systems out there that facet exactly how i'm asking. i thought solr is supposed to be the very best fastest search system, yet it doesn't seem to be able to facet correct for items with multiple values? Am i indexing my data wrong? how can i make this happen? -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-grou p-truncate-vs-facet-tp3838797p3893744.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-grou p-truncate-vs-facet-tp3838797p3897366.html To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
RE: To truncate or not to truncate (group.truncate vs. facet)
I believe you're looking for what's called, Matrix Counts Please see this JIRA issue. To my knowledge it has been committed in trunk but not 3.x. https://issues.apache.org/jira/browse/SOLR-2898 This feature is accessed by using group.facet=true Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Saturday, April 07, 2012 7:02 PM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I've been searching for a solution to my issue, and this seems to come closest to it. But not exactly. I am indexing clothing. Each article of clothing comes in many sizes and colors, and can belong to any number of categories. For example take the following: I add 6 documents to solr as follows: product, color, size, category shirt A, red, small, valentines day shirt A, red, large, valentines day shirt A, blue, small, valentines day shirt A, blue, large, valentines day shirt A, green, small, valentines day shirt A, green, large, valentines day I'd like my facet counts to return as follows: color red (1) blue (1) green (1) size small (1) large (1) category valentines day (1) But they come back like this: color: red (2) blue (2) green (2) size: small (2) large (2) category valentines day (6) I see the group.facet parameter in version 4.0 does exactly this. However how can I make this happen now? There are all sorts of ecommerce systems out there that facet exactly how i'm asking. i thought solr is supposed to be the very best fastest search system, yet it doesn't seem to be able to facet correct for items with multiple values? Am i indexing my data wrong? how can i make this happen? -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: To truncate or not to truncate (group.truncate vs. facet)
You tried adding the parameter group.facet=true ? Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Monday, April 09, 2012 10:09 AM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I did get this working with version 4. However my facet queries still don't group. Sent from my phone - Reply message - From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com Date: Mon, Apr 9, 2012 12:45 pm Subject: To truncate or not to truncate (group.truncate vs. facet) To: danjfoley d...@micamedia.com I believe you're looking for what's called, Matrix Counts Please see this JIRA issue. To my knowledge it has been committed in trunk but not 3.x. https://issues.apache.org/jira/browse/SOLR-2898 This feature is accessed by using group.facet=true Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Saturday, April 07, 2012 7:02 PM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I've been searching for a solution to my issue, and this seems to come closest to it. But not exactly. I am indexing clothing. Each article of clothing comes in many sizes and colors, and can belong to any number of categories. For example take the following: I add 6 documents to solr as follows: product, color, size, category shirt A, red, small, valentines day shirt A, red, large, valentines day shirt A, blue, small, valentines day shirt A, blue, large, valentines day shirt A, green, small, valentines day shirt A, green, large, valentines day I'd like my facet counts to return as follows: color red (1) blue (1) green (1) size small (1) large (1) category valentines day (1) But they come back like this: color: red (2) blue (2) green (2) size: small (2) large (2) category valentines day (6) I see the group.facet parameter in version 4.0 does exactly this. However how can I make this happen now? There are all sorts of ecommerce systems out there that facet exactly how i'm asking. i thought solr is supposed to be the very best fastest search system, yet it doesn't seem to be able to facet correct for items with multiple values? Am i indexing my data wrong? how can i make this happen? -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg== -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: To truncate or not to truncate (group.truncate vs. facet)
One other thing, I believe that you need to be using facet.field on single valued string fields for group.facet to function properly. Are the fields you're faceting on multiValued=false? Cody -Original Message- From: Young, Cody [mailto:cody.yo...@move.com] Sent: Monday, April 09, 2012 10:36 AM To: solr-user@lucene.apache.org Subject: RE: To truncate or not to truncate (group.truncate vs. facet) You tried adding the parameter group.facet=true ? Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Monday, April 09, 2012 10:09 AM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I did get this working with version 4. However my facet queries still don't group. Sent from my phone - Reply message - From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com Date: Mon, Apr 9, 2012 12:45 pm Subject: To truncate or not to truncate (group.truncate vs. facet) To: danjfoley d...@micamedia.com I believe you're looking for what's called, Matrix Counts Please see this JIRA issue. To my knowledge it has been committed in trunk but not 3.x. https://issues.apache.org/jira/browse/SOLR-2898 This feature is accessed by using group.facet=true Cody -Original Message- From: danjfoley [mailto:d...@micamedia.com] Sent: Saturday, April 07, 2012 7:02 PM To: solr-user@lucene.apache.org Subject: Re: To truncate or not to truncate (group.truncate vs. facet) I've been searching for a solution to my issue, and this seems to come closest to it. But not exactly. I am indexing clothing. Each article of clothing comes in many sizes and colors, and can belong to any number of categories. For example take the following: I add 6 documents to solr as follows: product, color, size, category shirt A, red, small, valentines day shirt A, red, large, valentines day shirt A, blue, small, valentines day shirt A, blue, large, valentines day shirt A, green, small, valentines day shirt A, green, large, valentines day I'd like my facet counts to return as follows: color red (1) blue (1) green (1) size small (1) large (1) category valentines day (1) But they come back like this: color: red (2) blue (2) green (2) size: small (2) large (2) category valentines day (6) I see the group.facet parameter in version 4.0 does exactly this. However how can I make this happen now? There are all sorts of ecommerce systems out there that facet exactly how i'm asking. i thought solr is supposed to be the very best fastest search system, yet it doesn't seem to be able to facet correct for items with multiple values? Am i indexing my data wrong? how can i make this happen? -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg== -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Distributed grouping issue
Hi Martijn, I created a JIRA issue and attached a test that fails. It seems to exhibit the same issue that I see on my local box. (If you run it multiple times you can see that the group value of the top doc changes between runs.) Also, I had to change add fixShardCount = true; in the constructor of the TestDistributedGrouping class, which caused another test case to fail. (It's commented out in the patch with a TODO above it.) Please let me know if you need any other information. https://issues.apache.org/jira/browse/SOLR-3316 Thanks!! Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 10:49 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue I tried the to reproduce this. However the matches always returns 4 in my case (when using rows=1 and rows=2). In your case the 2 documents on each core do belong to the same group, right? I did find something else. If I use rows=0 then an error occurs. I think we need to further investigate this. Can you open an issue in Jira? I'm a bit busy today. We can then further look into this in the coming days. Martijn On 2 April 2012 23:00, Young, Cody cody.yo...@move.com wrote: Okay, I've played with this a bit more. Found something interesting: When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core) Example: http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/s olr/core0,localhost:8983/solr/core1group=truegroup.field=group_field group.limit=10rows=1 lst name=grouped lst name=group_field int name=matches2/int Then, just by changing rows=2 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/s olr/core0,localhost:8983/solr/core1group=truegroup.field=group_field group.limit=10rows=2 lst name=grouped lst name=group_field int name=matches4/int Let me know if you have any luck reproducing. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 1:48 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue All documents of a group exist on a single shard, there are no cross-shard groups. You only have to partition documents by group when the groupCount and some other features need to be accurate. For the matches this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn -- Met vriendelijke groet, Martijn van Groningen
RE: shards and grouping
I received this error if the solr unique key was set to stored=false. Once I changed it to stored=true I stopped receiving this error. Cody -Original Message- From: ramires [mailto:uy...@beriltech.com] Sent: Tuesday, April 03, 2012 7:15 AM To: solr-user@lucene.apache.org Subject: shards and grouping hi I use solr4 with four shards like that http://x.x.x.x:8984/solr/arama4/select?shards=192.168.200.202:8984/solr/ar1/,192.168.200.202:8984/solr/ar2,192.168.200.202:8984/solr/ar3,192.168.200.202:8984/solr/ar4q=tablefacet=truefacet.field=sitef.site.facet.numFacetTerms=1facet.mincount=1facet.limit=-1group=truegroup.field=site It`s works all querys well but with group querys it throws a error 500. Is grouping work in solr 4 ? HTTP ERROR 500 Problem accessing /solr/arama4/select. Reason: null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:639) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:536) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:331) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- View this message in context: http://lucene.472066.n3.nabble.com/shards-and-grouping-tp3881069p3881069.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Distributed grouping issue
In the case of group=false: numFound=26 In the case of group=true: int name=matches34000/int As a note, the grouped number changes when I hit refresh. It seems to display the count from any single shard. (The top match also changes). I haven't tried this in other versions of solr. All documents of a group exist on a single shard, there are no cross-shard groups. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 3:15 AM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue The matches element in the response should return the number of documents that matched with the query and not the number of groups. Did you encountered this issue also with other Solr versions (3.5 or another nightly build)? Martijn On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote: Hi, when you write I get xxx results, does it come from 'numFound' ? Or you really display xxx results ? When using both field collapsing and sharding, the 'numFound' may be wrong. In that case, think about using 'shards.rows' parameter with a high value (be careful, it's bad for performance). If the problem is really about the returned results, it may be because of several documents having the same unique key document_id in different shards. Hope it helps, Franck Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit : I forgot to mention, I can see the distributed requests happening in the logs: Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core3] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=0 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core7] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=3 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core5] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion =2rows=10group.topgroups.group_field=4183765296group.topgroups.grou p_field=4608765424group.topgroups.group_field=3524954944group.topgro ups.group_field=4182445488group.topgroups.group_field=4213143392grou p.topgroups.group_field=4328299312group.topgroups.group_field=4206259 648group.topgroups.group_field=3465497912group.topgroups.group_field =3554417600group.topgroups.group_field=3140802904fl=document_id,scor eshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:* group.field=group_fieldgroup=trueisShard=true} status=0 QTime
RE: Distributed grouping issue
Okay, I've played with this a bit more. Found something interesting: When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core) Example: http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1 lst name=grouped lst name=group_field int name=matches2/int Then, just by changing rows=2 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2 lst name=grouped lst name=group_field int name=matches4/int Let me know if you have any luck reproducing. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 1:48 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue All documents of a group exist on a single shard, there are no cross-shard groups. You only have to partition documents by group when the groupCount and some other features need to be accurate. For the matches this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn
Distributed grouping issue
Hi All, I'm having an issue getting distributed grouping working on trunk (Mar 29, 2012). If I send this query: http://localhost:8086/solr/core0/select/?q=*:*group=false shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I get 260,000 results. As soon as I change to using grouping: http://localhost:8086/solr/core0/select/?q=*:*group=truegroup.field=group_fieldshards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I only get 32,000 results. (the number of documents in a single core.) The field that I am grouping on is defined as: field name=group_field type=string indexed=true stored=true multiValued=false / fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ The document id: field name=document_id type=string indexed=true stored=true required=true / fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ uniqueKeydocument_id/uniqueKey Anyone else experiencing this? Any ideas? Thanks, Cody
RE: Distributed grouping issue
- From: Young, Cody [mailto:cody.yo...@move.com] Sent: Friday, March 30, 2012 4:35 PM To: solr-user@lucene.apache.org Subject: Distributed grouping issue Hi All, I'm having an issue getting distributed grouping working on trunk (Mar 29, 2012). If I send this query: http://localhost:8086/solr/core0/select/?q=*:*group=false shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I get 260,000 results. As soon as I change to using grouping: http://localhost:8086/solr/core0/select/?q=*:*group=truegroup.field=group_fieldshards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7 I only get 32,000 results. (the number of documents in a single core.) The field that I am grouping on is defined as: field name=group_field type=string indexed=true stored=true multiValued=false / fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ The document id: field name=document_id type=string indexed=true stored=true required=true / fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ uniqueKeydocument_id/uniqueKey Anyone else experiencing this? Any ideas? Thanks, Cody
RE: Solr core swap after rebuild in HA-setup / High-traffic
We have a very similar system. In our case we have a row version field in our sql database. When we run the full import we keep track of the latest row version at the time that the full import started. Once the full import is done we run an optimize and then run a delta import (actually this is a full import in a different DIH file, same mappings just with some different parameters to the stored procedure that we call). This catches the offline core up to live and then we swap. Why can't you run the delta import on the rebuild core? Is there some specific issue that is preventing you? Cody -Original Message- From: KeesSchepers [mailto:k...@keesschepers.nl] Sent: Wednesday, March 14, 2012 11:59 AM To: solr-user@lucene.apache.org Subject: Solr core swap after rebuild in HA-setup / High-traffic Hello everybody, I am designing a new Solr architecture for one of my clients. This sorl architecture is for a high-traffic website with million of visitors but I am facing some design problems were I hope you guys could help me out. In my situation there are 4 Solr servers running, 1 server is master and 3 are slave. They are running Solr version 1.4. I use two cores 'live' and 'rebuild' and I use Solr DIH to rebuild a core which goes like this: 1. I wipe the reindex core 2. I run the DIH to the complete dataset (4 million documents) in peices of 20.000 records (to prevent very long mysql locks) 3. After the DIH is finished (2 hours) we have to also have to update the rebuild core with changes from the last two hours, this is a problem 4. After updating is done and the core is not more then some seconds behind we want to SWAP the cores. Everything goes well except for step 3. The rebuild and the core swap is all okay. Because the website is undergoing changes every minute we cannot pauze the delta-import on the live and walk behind for 2 hours. The problem is that I can't figure out a closing system with not delaying the live core to long and use the DIH instead of writing a lot of code. Did anyone face this problem before or could give me some tips? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826461.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with result grouping
There's another discussion going on about this in the solr users mail list, but I think you're using *.* instead of *:* to match all documents. *.* ends up doing a search against the default field where *:* means match all documents. Cody -Original Message- From: Kissue Kissue [mailto:kissue...@gmail.com] Sent: Tuesday, December 13, 2011 6:06 AM To: solr-user@lucene.apache.org Subject: Problem with result grouping Hi, Maybe there is something i am missing here but i have a field in my solr index called categoryId. The field definition is as follows: field name=categoryId type=string indexed=true stored=true required=true / I am trying to group on this field and i get a result as follows: str name=groupValue43201810/str result name=doclist numFound=72 start=0 This is the query i am sending to solr: http://localhost:8080/solr/catalogue/select/?q=*.*%0D%0Aversion=2.2sta rt=0rows=1000indent=ongroup=truegroup.field=categoryId My understanding is that this means there are 72 documents in my index that have the value 43201810 for categoryId. Now surprisingly when i search my index specifically for categoryId:43201810 expecting to get 72 results i instead get 124 results. This is the query sent: http://localhost:8080/solr/catalogue/select/?q=categoryId%3A43201810ver sion=2.2start=0rows=10indent=on Is my understanding of result grouping correct? Is there something i am doing wrong. Any help will be much appreciated. I am using Solr 3.5 Thanks.
RE: avoid overwrite in DataImportHandler
I believe all you need to do is add a ?clean=false to your query string. If you have a unique key setup as your ID in solr then it should update the existing documents instead of delete and re-indexing. Cody -Original Message- From: P Williams [mailto:williams.tricia.l...@gmail.com] Sent: Thursday, December 08, 2011 11:11 AM To: solr-user@lucene.apache.org Subject: Re: avoid overwrite in DataImportHandler Ah. Thanks Erick. I see now that my question is different from sabman's. Is there a way to use the DataImportHandler's full-import command so that it does not delete the existing material before it begins? Thanks, Tricia On Thu, Dec 8, 2011 at 6:35 AM, Erick Erickson erickerick...@gmail.comwrote: This is all controlled by Solr via the uniqueKey field in your schema. Just remove that entry. But then it's all up to you to handle the fact that there will be multiple documents with the same ID all returned as a result of querying. And it won't matter what program adds data, *nothing* will be overwritten, DIH has no part in that decision. Deduplication is about defining some fields in your record and avoiding adding another document if the contents are close, where close is a slippery concept. I don't think it's related to your problem at all. Best Erick On Wed, Dec 7, 2011 at 3:27 PM, P Williams williams.tricia.l...@gmail.com wrote: Hi, I've wondered the same thing myself. I feel like the clean parameter has something to do with it but it doesn't work as I'd expect either. Thanks in advance to anyone who can answer this question. *clean* : (default 'true'). Tells whether to clean up the index before the indexing is started. Tricia On Wed, Dec 7, 2011 at 12:49 PM, sabman sab...@gmail.com wrote: I have a unique ID defined for the documents I am indexing. I want to avoid overwriting the documents that have already been indexed. I am using XPathEntityProcessor and TikaEntityProcessor to process the documents. The DataImportHandler does not seem to have the option to set overwrite=false. I have read some other forums to use deduplication instead but I don't see how it is related to my problem. Any help on this (or explanation on how deduplication would apply to my probelm ) would be great. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/avoid-overwrite-in-DataImportHandle r-tp3568435p3568435.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: debugging failed documents
In my experience with DIH, the errors for failed documents end up in the log files. Catalina.out for Tomcat. Can you check your log files? Cody -Original Message- From: Alan Miller [mailto:alan.mill...@gmail.com] Sent: Tuesday, December 06, 2011 1:25 PM To: Solr Subject: debugging failed documents Just getting started with DIH and I have a very simple setup. My dih-config.xml is querying my postgres db and does a select on a crosstab() table that returns just 100 rows. When i do a full-import i see that 22 docs fail but what debug settings do i have to tweak to see why the docs failed? When I run the same query manually I see all 100 rows. The 22 rows missing in my index look good in the manual output. Alan Sent from my iPhone
RE: Index a null text field
I don't see anything wrong so far other than a typo here (missing a p in the second price): field column=lastbid_price name=lastbid_rice / Can you see if there are any warnings in the log about documents not being able to be created? Also, you should have a field type definition for text in your schema. It will look something like fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index Can you send the full field type definition along as well? You can also try running a query like: ?q=keyword_stock:[* TO *] That will return any documents where keyword_stock is populated. Thanks, Cody -Original Message- From: jawedshamshedi [mailto:jawedshamsh...@gmail.com] Sent: Thursday, November 24, 2011 9:42 PM To: solr-user@lucene.apache.org Subject: RE: Index a null text field Hi Cody, Thanks for the reply. Please find the detail of that I am doing. Yes, I am using dataimport handler and the code snippet of it from solrconfig.xml is given below. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler The data-config.xml is give below. dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/database?zeroDateTimeBehavior=convertToNull user=username password=password/ document name=content entity name=auction query=query(it's working file) field column=keyword name=keyword_stock / field column=start_bidprice name=start_bidprice / field column=end_date name=end_date / field column=start_date name=start_date / field column=lastbid_price name=lastbid_rice / field column=un_id name=un_id / /entity /document /dataConfig schema.xml fields field name=un_id type=string indexed=true stored=true required=true / field name=keyword_stock type=text indexed=true stored=true / field name=start_bidprice type=tlong indexed=true stored=true / field name=end_date type=tdate indexed=true stored=true / field name=start_date type=tdate indexed=true stored=true / field name=lastbid_price type=tfloat indexed=true stored=true / /fields uniqueKeyun_id/uniqueKey defaultSearchFieldST_Name/defaultSearchField he date type in mysql is given below. keyword text start_bidprice float(12,2) end_datedatetime start_bidprice float(12,2) start_date datetime for some fields that are simple float, there index are being created. I also added this in data-config.xml's url zeroDateTimeBehavior=convertToNull but no avail. Please help Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-a-null-text-field-tp3533636p353 5376.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Index a null text field
Hello, We'll need more information please. How are you indexing the documents? DataImportHandler? Xml Updates? Can you show us the relevant parts of your schema? (Field definition and data type for the field) Are you getting any error messages in the log files? Tell us more about your environment. Windows? Linux? Thanks, Cody -Original Message- From: jawedshamshedi [mailto:jawedshamsh...@gmail.com] Sent: Thursday, November 24, 2011 5:38 AM To: solr-user@lucene.apache.org Subject: Index a null text field Hi all, I am indexing a table that has a field by the name of solr_keywords of type text in mysql. And it contains null values also. While creating index in solr, this field is not getting indexed. Any help will be appreciated. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Index-a-null-text-field-tp3533636p353 3636.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Efficient title sorting on large result sets.
Hi Andrew, When you request a sort on a field, Lucene stores every unique value in a field cache, which stays in ram. If you have a large index and you're sorting on a Unicode string field, this can be very memory intensive. The way that I've solved this in the past is to make a field specifically for sorting and then truncate the string to a small number of characters and sort on that. You have to accept that in some cases sort order will be wrong. (If you truncate to 6 characters and then sort Thisisastring and Thisisnotastring) you're not guaranteed to get the correct sort order. The memory benefits to this are two-fold though, you have a shorter string which takes up less memory, and you have a decreased number of unique values. Cody -Original Message- From: Andrew Ingram [mailto:andrew.ing...@tangentlabs.co.uk] Sent: Monday, November 21, 2011 3:23 AM To: solr-user@lucene.apache.org Subject: Efficient title sorting on large result sets. Hi everyone, We have a large product catalogue (currently 9 million, but soon to inflate to around 25 million) with each product have a unicode title. We're offering the facility to sort by title, but often within quite large result sets, eg 1 million fiction books (we are correctly using filters). Aside from the obvious questionable use of sorting over such a large set of results, I'm wondering if there's any steps I can take to optimise title sorting and minimise memory use. Solr also crashes with OutOfMemoryErrors every couple of days, could this be related to the sorting by title? Or should I be looking for another cause? The machine Solr is on has 8gb ram, 7 of which is given to Solr. We have other sites with larger catalogues and similar spec hardware that aren't having any issues, the title sorting seems to be the only major difference in functionality. I'll be very grateful for any assistance. Regards, Andy Ingram