RE: Duplicates results when using a non optimized index

2008-05-15 Thread Tim Mahy
Hi,

yep it is a very strange problem that we never encountered before.
We are uploading all the documents again to see if that solves the problem 
(hoping that the delete will delete also the multiple document instances)

greetings,
Tim

Van: Otis Gospodnetic [EMAIL PROTECTED]
Verzonden: woensdag 14 mei 2008 23:18
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Duplicates results when using a non optimized index

Tim,

Hm, not sure what caused this.  1.2 is now quite old (yes, I know it's the last 
stable release), so if I were you I would consider moving to 1.3-dev.  It 
sounds like the index is already polluted with duplicate documents, so you'll 
want to rebuild the index whether you decide to stay with 1.2 or move to 
1.3-dev.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Tim Mahy [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wednesday, May 14, 2008 3:59:23 AM
 Subject: RE: Duplicates results when using a non optimized index

 Hi,

 thanks for the answer,

 - do duplicates go away after optimization is done?
 -- no, if we search the index even after it is optimized, we still get the
 duplicate results and even if we search on one of the slaves servers  which 
 have
 the same index through synchronization ...
 btw this is the first time we notice this, the only thing we have had was the
 known problem with the too many open files which we fixed using the ulimit 
 and
 rebooted the tomcat server 

 - do duplicate IDs that you are seeing IDs of previously deleted documents?
 -- it is possible that these documenst were uploaded earlier and have been
 replaced...

 - which Solr version are you using and can you try a recent nightly?
 -- we use the 1.2 stable build

 greetings,
 Tim
 
 Van: Otis Gospodnetic [EMAIL PROTECTED]
 Verzonden: woensdag 14 mei 2008 6:11
 Aan: solr-user@lucene.apache.org
 Onderwerp: Re: Duplicates results when using a non optimized index

 Hm, not sure why that is happening, but here is some info regarding other 
 stuff
 from your email

 - there should be no duplicates even if you are searching an index that is 
 being
 optimized
 - why are you searching an index that is being optimized?  It's doable, but
 people typically perform index-modifying operations on a Solr master and
 read-only operations on Solr query slave(s)
 - do duplicates go away after optimization is done?
 - do duplicate IDs that you are seeing IDs of previously deleted documents?
 - which Solr version are you using and can you try a recent nightly?


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Tim Mahy
  To: solr-user@lucene.apache.org
  Sent: Tuesday, May 13, 2008 5:59:28 AM
  Subject: Duplicates results when using a non optimized index
 
  Hi all,
 
  is this expected behavior when having an index like this :
 
  numDocs : 9479963
  maxDoc : 12622942
  readerImpl : MultiReader
 
  which is in the process of optimizing that when we search through the index 
  we
  get this :
 
 
  15257559
 
 
  15257559
 
 
  17177888
 
 
  11825631
 
 
  11825631
 
 
  The id field is declared like this :
 
 
  and is set as the unique identity like this in the schema xml :
id
 
  so the question : is this expected behavior and if so is there a way to let
 Solr
  only return unique documents ?
 
  greetings and thanx in advance,
  Tim
 
 
 
 
  Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx





 Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx





Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


Re: Duplicates results when using a non optimized index

2008-05-15 Thread Mike Klaas


On 15-May-08, at 12:50 AM, Tim Mahy wrote:


Hi,

yep it is a very strange problem that we never encountered before.
We are uploading all the documents again to see if that solves the  
problem (hoping that the delete will delete also the multiple  
document instances)


If you are re-adding everything anyway, execute a delete query *:*  
before hand--that's zap everything.


There should never be documents with the same value in uniqueKey field  
if you have everything configured correctly.


-Mike 


Re: Duplicates results when using a non optimized index

2008-05-14 Thread Otis Gospodnetic
Tim,

Hm, not sure what caused this.  1.2 is now quite old (yes, I know it's the last 
stable release), so if I were you I would consider moving to 1.3-dev.  It 
sounds like the index is already polluted with duplicate documents, so you'll 
want to rebuild the index whether you decide to stay with 1.2 or move to 
1.3-dev.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Tim Mahy [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wednesday, May 14, 2008 3:59:23 AM
 Subject: RE: Duplicates results when using a non optimized index
 
 Hi,
 
 thanks for the answer,
 
 - do duplicates go away after optimization is done?
 -- no, if we search the index even after it is optimized, we still get the 
 duplicate results and even if we search on one of the slaves servers  which 
 have 
 the same index through synchronization ...
 btw this is the first time we notice this, the only thing we have had was the 
 known problem with the too many open files which we fixed using the ulimit 
 and 
 rebooted the tomcat server 
 
 - do duplicate IDs that you are seeing IDs of previously deleted documents?
 -- it is possible that these documenst were uploaded earlier and have been 
 replaced...
 
 - which Solr version are you using and can you try a recent nightly?
 -- we use the 1.2 stable build
 
 greetings,
 Tim
 
 Van: Otis Gospodnetic [EMAIL PROTECTED]
 Verzonden: woensdag 14 mei 2008 6:11
 Aan: solr-user@lucene.apache.org
 Onderwerp: Re: Duplicates results when using a non optimized index
 
 Hm, not sure why that is happening, but here is some info regarding other 
 stuff 
 from your email
 
 - there should be no duplicates even if you are searching an index that is 
 being 
 optimized
 - why are you searching an index that is being optimized?  It's doable, but 
 people typically perform index-modifying operations on a Solr master and 
 read-only operations on Solr query slave(s)
 - do duplicates go away after optimization is done?
 - do duplicate IDs that you are seeing IDs of previously deleted documents?
 - which Solr version are you using and can you try a recent nightly?
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
  From: Tim Mahy 
  To: solr-user@lucene.apache.org 
  Sent: Tuesday, May 13, 2008 5:59:28 AM
  Subject: Duplicates results when using a non optimized index
 
  Hi all,
 
  is this expected behavior when having an index like this :
 
  numDocs : 9479963
  maxDoc : 12622942
  readerImpl : MultiReader
 
  which is in the process of optimizing that when we search through the index 
  we
  get this :
 
 
  15257559
 
 
  15257559
 
 
  17177888
 
 
  11825631
 
 
  11825631
 
 
  The id field is declared like this :
 
 
  and is set as the unique identity like this in the schema xml :
id
 
  so the question : is this expected behavior and if so is there a way to let 
 Solr
  only return unique documents ?
 
  greetings and thanx in advance,
  Tim
 
 
 
 
  Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
 
 
 
 
 
 Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx



Duplicates results when using a non optimized index

2008-05-13 Thread Tim Mahy
Hi all,

is this expected behavior when having an index like this :

numDocs : 9479963
maxDoc : 12622942
readerImpl : MultiReader

which is in the process of optimizing that when we search through the index we 
get this :

doc
long name=id15257559/long
/doc
doc
long name=id15257559/long
/doc
doc
long name=id17177888/long
/doc
doc
long name=id11825631/long
/doc
doc
long name=id11825631/long
/doc

The id field is declared like this :
field name=id type=long indexed=true stored=true required=true /

and is set as the unique identity like this in the schema xml :
  uniqueKeyid/uniqueKey

so the question : is this expected behavior and if so is there a way to let 
Solr only return unique documents ?

greetings and thanx in advance,
Tim




Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx


Re: Duplicates results when using a non optimized index

2008-05-13 Thread Otis Gospodnetic
Hm, not sure why that is happening, but here is some info regarding other stuff 
from your email

- there should be no duplicates even if you are searching an index that is 
being optimized
- why are you searching an index that is being optimized?  It's doable, but 
people typically perform index-modifying operations on a Solr master and 
read-only operations on Solr query slave(s)
- do duplicates go away after optimization is done?
- do duplicate IDs that you are seeing IDs of previously deleted documents?
- which Solr version are you using and can you try a recent nightly?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Tim Mahy [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tuesday, May 13, 2008 5:59:28 AM
 Subject: Duplicates results when using a non optimized index
 
 Hi all,
 
 is this expected behavior when having an index like this :
 
 numDocs : 9479963
 maxDoc : 12622942
 readerImpl : MultiReader
 
 which is in the process of optimizing that when we search through the index 
 we 
 get this :
 
 
 15257559
 
 
 15257559
 
 
 17177888
 
 
 11825631
 
 
 11825631
 
 
 The id field is declared like this :
 
 
 and is set as the unique identity like this in the schema xml :
   id
 
 so the question : is this expected behavior and if so is there a way to let 
 Solr 
 only return unique documents ?
 
 greetings and thanx in advance,
 Tim
 
 
 
 
 Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx