I have encountered another case where deleteByQuery fails. It fails for when i have a catalogueId value "waaaaaaaaaaaawwwwwwwwwwwwwwwwwwqqqqqqqq" and thus issue the query ( {!term f=catalogueId}waaaaaaaaaaaawwwwwwwwwwwwwwwwwwqqqqqqqq). One of my customers just reported this now. Any ideas why a value like that when issued in a deleteByQuery should be wiping out the entire index?
Thanks. On Thu, Sep 27, 2012 at 2:27 PM, Kissue Kissue <kissue...@gmail.com> wrote: > Actually this problem occurs even when i am doing just deletes. I tested > by sending only one delete query for a single catalogue and had the same > problem. I always optimize once. > > I changed to the syntax you suggested ( {!term f=catalogueId}Emory Labs) > and works like a charm. Thanks for the pointer, saved me from another issue > that could have occurred at some point. > > Thanks. > > > > > On Thu, Sep 27, 2012 at 12:30 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> Wild shot in the dark.... >> >> What happens if you switch from StreamingUpdateSolrServer to >> HttpSolrServer? >> >> What I'm wondering is if somehow you're getting a queueing problem. If >> you have >> multiple threads defined for SUSS, it might be possible (and I'm >> guessing) that >> the delete bit is getting sent after some of the adds. Frankly I doubt >> this is >> the case, but this issue is so weird that I'm grasping at straws. >> >> BTW, there's no reason to optimize twice. Actually, the new thinking is >> that >> optimizing usually isn't necessary anyway. But if you insist on optimizing >> there's no reason to do it _both_ after the deletes and after the adds, >> just >> do it after the adds. >> >> Best >> Erick >> >> On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue <kissue...@gmail.com> >> wrote: >> > #What is the field type for that field - string or text? >> > >> > It is a string type. >> > >> > Thanks. >> > >> > On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky < >> j...@basetechnology.com>wrote: >> > >> >> What is the field type for that field - string or text? >> >> >> >> >> >> -- Jack Krupansky >> >> >> >> -----Original Message----- From: Kissue Kissue >> >> Sent: Wednesday, September 26, 2012 1:43 PM >> >> >> >> To: solr-user@lucene.apache.org >> >> Subject: Re: Items disappearing from Solr index >> >> >> >> # It is looking for documents with "Emory" in the specified field OR >> "Labs" >> >> in the default search field. >> >> >> >> This does not seem to be the case. For instance issuing a >> deleteByQuery for >> >> catalogueId: "PEARL LINGUISTICS LTD" also deletes the contents of a >> >> catalogueId with the value: "Ncl_**MacNaughtonMcGregorCoaching_** >> >> vf010811". >> >> >> >> Thanks. >> >> >> >> On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky < >> j...@basetechnology.com>* >> >> *wrote: >> >> >> >> It is looking for documents with "Emory" in the specified field OR >> "Labs" >> >>> in the default search field. >> >>> >> >>> -- Jack Krupansky >> >>> >> >>> -----Original Message----- From: Kissue Kissue >> >>> Sent: Wednesday, September 26, 2012 7:47 AM >> >>> To: solr-user@lucene.apache.org >> >>> Subject: Re: Items disappearing from Solr index >> >>> >> >>> >> >>> I have just solved this problem. >> >>> >> >>> We have a field called catalogueId. One possible value for this field >> >>> could >> >>> be "Emory Labs". I found out that when the following delete by query >> is >> >>> sent to solr: >> >>> >> >>> getSolrServer().deleteByQuery(****catalogueId + ":" + Emory Labs) >> >>> [Notice >> >>> >> >>> that >> >>> there are no quotes surrounding the catalogueId value - Emory Labs] >> >>> >> >>> For some reason this delete by query ends up deleting the contents of >> some >> >>> other random catalogues too which is the reason why we are loosing >> items >> >>> from the index. When the query is changed to: >> >>> >> >>> getSolrServer().deleteByQuery(****catalogueId + ":" + "Emory Labs"), >> >>> then it >> >>> >> >>> starts to correctly delete only items in the Emory Labs catalogue. >> >>> >> >>> So my first question is, what exactly does deleteByQuery do in the >> first >> >>> query without the quotes? How is it determining which catalogues to >> >>> delete? >> >>> >> >>> Secondly, shouldn't the correct behaviour be not to delete anything >> at all >> >>> in this case since when a search is done for the same catalogueId >> without >> >>> the quotes it just simply returns no results? >> >>> >> >>> Thanks. >> >>> >> >>> >> >>> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue <kissue...@gmail.com> >> >>> wrote: >> >>> >> >>> Hi Erick, >> >>> >> >>>> >> >>>> Thanks for your reply. Yes i am using delete by query. I am currently >> >>>> logging the number of items to be deleted before handing off to >> solr. And >> >>>> from solr logs i can it deleted exactly that number. I will verify >> >>>> further. >> >>>> >> >>>> Thanks. >> >>>> >> >>>> >> >>>> On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson < >> erickerick...@gmail.com >> >>>> > >> >>>> **wrote: >> >>>> >> >>>> >> >>>> How do you delete items? By ID or by query? >> >>>> >> >>>>> >> >>>>> My guess is that one of two things is happening: >> >>>>> 1> your delete process is deleting too much data. >> >>>>> 2> your index process isn't indexing what you think. >> >>>>> >> >>>>> I'd add some logging to the SolrJ program to see what >> >>>>> it thinks is has deleted or added to the index and go from there. >> >>>>> >> >>>>> Best >> >>>>> Erick >> >>>>> >> >>>>> On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue <kissue...@gmail.com >> > >> >>>>> wrote: >> >>>>> > Hi, >> >>>>> > >> >>>>> > I am running Solr 3.5, using SolrJ and using >> StreamingUpdateSolrServer >> >>>>> to >> >>>>> > index and delete items from solr. >> >>>>> > >> >>>>> > I basically index items from the db into solr every night. >> Existing >> >>>>> items >> >>>>> > can be marked for deletion in the db and a delete request sent to >> solr >> >>>>> to >> >>>>> > delete such items. >> >>>>> > >> >>>>> > My process runs as follows every night: >> >>>>> > >> >>>>> > 1. Check if items have been marked for deletion and delete from >> solr. >> >>>>> > I >> >>>>> > commit and optimize after the entire solr deletion runs. >> >>>>> > 2. Index any new items to solr. I commit and optimize after all >> the > >> >>>>> new >> >>>>> > items have been added. >> >>>>> > >> >>>>> > Recently i started noticing that huge chunks of items that have >> not > >> >>>>> been >> >>>>> > marked for deletion are disappearing from the index. I checked >> the > >> >>>>> solr >> >>>>> > logs and the logs indicate that it is deleting exactly the number >> of >> >>>>> items >> >>>>> > requested but still a lot of other items disappear from the index >> from >> >>>>> time >> >>>>> > to time. Any ideas what might be causing this or what i am doing > >> >>>>> wrong. >> >>>>> > >> >>>>> > >> >>>>> > Thanks. >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >> >> > >