RE: Duplicates results when using a non optimized index
Hi, yep it is a very strange problem that we never encountered before. We are uploading all the documents again to see if that solves the problem (hoping that the delete will delete also the multiple document instances) greetings, Tim Van: Otis Gospodnetic [EMAIL PROTECTED] Verzonden: woensdag 14 mei 2008 23:18 Aan: solr-user@lucene.apache.org Onderwerp: Re: Duplicates results when using a non optimized index Tim, Hm, not sure what caused this. 1.2 is now quite old (yes, I know it's the last stable release), so if I were you I would consider moving to 1.3-dev. It sounds like the index is already polluted with duplicate documents, so you'll want to rebuild the index whether you decide to stay with 1.2 or move to 1.3-dev. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tim Mahy [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, May 14, 2008 3:59:23 AM Subject: RE: Duplicates results when using a non optimized index Hi, thanks for the answer, - do duplicates go away after optimization is done? -- no, if we search the index even after it is optimized, we still get the duplicate results and even if we search on one of the slaves servers which have the same index through synchronization ... btw this is the first time we notice this, the only thing we have had was the known problem with the too many open files which we fixed using the ulimit and rebooted the tomcat server - do duplicate IDs that you are seeing IDs of previously deleted documents? -- it is possible that these documenst were uploaded earlier and have been replaced... - which Solr version are you using and can you try a recent nightly? -- we use the 1.2 stable build greetings, Tim Van: Otis Gospodnetic [EMAIL PROTECTED] Verzonden: woensdag 14 mei 2008 6:11 Aan: solr-user@lucene.apache.org Onderwerp: Re: Duplicates results when using a non optimized index Hm, not sure why that is happening, but here is some info regarding other stuff from your email - there should be no duplicates even if you are searching an index that is being optimized - why are you searching an index that is being optimized? It's doable, but people typically perform index-modifying operations on a Solr master and read-only operations on Solr query slave(s) - do duplicates go away after optimization is done? - do duplicate IDs that you are seeing IDs of previously deleted documents? - which Solr version are you using and can you try a recent nightly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tim Mahy To: solr-user@lucene.apache.org Sent: Tuesday, May 13, 2008 5:59:28 AM Subject: Duplicates results when using a non optimized index Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : 15257559 15257559 17177888 11825631 11825631 The id field is declared like this : and is set as the unique identity like this in the schema xml : id so the question : is this expected behavior and if so is there a way to let Solr only return unique documents ? greetings and thanx in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
Re: Duplicates results when using a non optimized index
On 15-May-08, at 12:50 AM, Tim Mahy wrote: Hi, yep it is a very strange problem that we never encountered before. We are uploading all the documents again to see if that solves the problem (hoping that the delete will delete also the multiple document instances) If you are re-adding everything anyway, execute a delete query *:* before hand--that's zap everything. There should never be documents with the same value in uniqueKey field if you have everything configured correctly. -Mike
Re: Duplicates results when using a non optimized index
Tim, Hm, not sure what caused this. 1.2 is now quite old (yes, I know it's the last stable release), so if I were you I would consider moving to 1.3-dev. It sounds like the index is already polluted with duplicate documents, so you'll want to rebuild the index whether you decide to stay with 1.2 or move to 1.3-dev. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tim Mahy [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, May 14, 2008 3:59:23 AM Subject: RE: Duplicates results when using a non optimized index Hi, thanks for the answer, - do duplicates go away after optimization is done? -- no, if we search the index even after it is optimized, we still get the duplicate results and even if we search on one of the slaves servers which have the same index through synchronization ... btw this is the first time we notice this, the only thing we have had was the known problem with the too many open files which we fixed using the ulimit and rebooted the tomcat server - do duplicate IDs that you are seeing IDs of previously deleted documents? -- it is possible that these documenst were uploaded earlier and have been replaced... - which Solr version are you using and can you try a recent nightly? -- we use the 1.2 stable build greetings, Tim Van: Otis Gospodnetic [EMAIL PROTECTED] Verzonden: woensdag 14 mei 2008 6:11 Aan: solr-user@lucene.apache.org Onderwerp: Re: Duplicates results when using a non optimized index Hm, not sure why that is happening, but here is some info regarding other stuff from your email - there should be no duplicates even if you are searching an index that is being optimized - why are you searching an index that is being optimized? It's doable, but people typically perform index-modifying operations on a Solr master and read-only operations on Solr query slave(s) - do duplicates go away after optimization is done? - do duplicate IDs that you are seeing IDs of previously deleted documents? - which Solr version are you using and can you try a recent nightly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tim Mahy To: solr-user@lucene.apache.org Sent: Tuesday, May 13, 2008 5:59:28 AM Subject: Duplicates results when using a non optimized index Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : 15257559 15257559 17177888 11825631 11825631 The id field is declared like this : and is set as the unique identity like this in the schema xml : id so the question : is this expected behavior and if so is there a way to let Solr only return unique documents ? greetings and thanx in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
Duplicates results when using a non optimized index
Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : doc long name=id15257559/long /doc doc long name=id15257559/long /doc doc long name=id17177888/long /doc doc long name=id11825631/long /doc doc long name=id11825631/long /doc The id field is declared like this : field name=id type=long indexed=true stored=true required=true / and is set as the unique identity like this in the schema xml : uniqueKeyid/uniqueKey so the question : is this expected behavior and if so is there a way to let Solr only return unique documents ? greetings and thanx in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
Re: Duplicates results when using a non optimized index
Hm, not sure why that is happening, but here is some info regarding other stuff from your email - there should be no duplicates even if you are searching an index that is being optimized - why are you searching an index that is being optimized? It's doable, but people typically perform index-modifying operations on a Solr master and read-only operations on Solr query slave(s) - do duplicates go away after optimization is done? - do duplicate IDs that you are seeing IDs of previously deleted documents? - which Solr version are you using and can you try a recent nightly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tim Mahy [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, May 13, 2008 5:59:28 AM Subject: Duplicates results when using a non optimized index Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : 15257559 15257559 17177888 11825631 11825631 The id field is declared like this : and is set as the unique identity like this in the schema xml : id so the question : is this expected behavior and if so is there a way to let Solr only return unique documents ? greetings and thanx in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx