I have encountered another case where deleteByQuery fails. It fails for
when i have a catalogueId value "waaaaaaaaaaaawwwwwwwwwwwwwwwwwwqqqqqqqq"
and thus issue the query ( {!term
f=catalogueId}waaaaaaaaaaaawwwwwwwwwwwwwwwwwwqqqqqqqq). One of my customers
just reported this now. Any ideas why a value like that when issued in a
deleteByQuery should be wiping out the entire index?

Thanks.

On Thu, Sep 27, 2012 at 2:27 PM, Kissue Kissue <kissue...@gmail.com> wrote:

> Actually this problem occurs even when i am doing just deletes. I tested
> by sending only one delete query for a single catalogue and had the same
> problem. I always optimize once.
>
> I changed to the syntax you suggested ( {!term f=catalogueId}Emory Labs)
> and works like a charm. Thanks for the pointer, saved me from another issue
> that could have occurred at some point.
>
> Thanks.
>
>
>
>
> On Thu, Sep 27, 2012 at 12:30 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> Wild shot in the dark....
>>
>> What happens if you switch from StreamingUpdateSolrServer to
>> HttpSolrServer?
>>
>> What I'm wondering is if somehow you're getting a queueing problem. If
>> you have
>> multiple threads defined for SUSS, it might be possible (and I'm
>> guessing) that
>> the delete bit is getting sent after some of the adds. Frankly I doubt
>> this is
>> the case, but this issue is so weird that I'm grasping at straws.
>>
>> BTW, there's no reason to optimize twice. Actually, the new thinking is
>> that
>> optimizing usually isn't necessary anyway. But if you insist on optimizing
>> there's no reason to do it _both_ after the deletes and after the adds,
>> just
>> do it after the adds.
>>
>> Best
>> Erick
>>
>> On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue <kissue...@gmail.com>
>> wrote:
>> > #What is the field type for that field - string or text?
>> >
>> > It is a string type.
>> >
>> > Thanks.
>> >
>> > On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky <
>> j...@basetechnology.com>wrote:
>> >
>> >> What is the field type for that field - string or text?
>> >>
>> >>
>> >> -- Jack Krupansky
>> >>
>> >> -----Original Message----- From: Kissue Kissue
>> >> Sent: Wednesday, September 26, 2012 1:43 PM
>> >>
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: Items disappearing from Solr index
>> >>
>> >> # It is looking for documents with "Emory" in the specified field OR
>> "Labs"
>> >> in the default search field.
>> >>
>> >> This does not seem to be the case. For instance issuing a
>> deleteByQuery for
>> >> catalogueId: "PEARL LINGUISTICS LTD" also deletes the contents of a
>> >> catalogueId with the value: "Ncl_**MacNaughtonMcGregorCoaching_**
>> >> vf010811".
>> >>
>> >> Thanks.
>> >>
>> >> On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky <
>> j...@basetechnology.com>*
>> >> *wrote:
>> >>
>> >>  It is looking for documents with "Emory" in the specified field OR
>> "Labs"
>> >>> in the default search field.
>> >>>
>> >>> -- Jack Krupansky
>> >>>
>> >>> -----Original Message----- From: Kissue Kissue
>> >>> Sent: Wednesday, September 26, 2012 7:47 AM
>> >>> To: solr-user@lucene.apache.org
>> >>> Subject: Re: Items disappearing from Solr index
>> >>>
>> >>>
>> >>> I have just solved this problem.
>> >>>
>> >>> We have a field called catalogueId. One possible value for this field
>> >>> could
>> >>> be "Emory Labs". I found out that when the following delete by query
>> is
>> >>> sent to solr:
>> >>>
>> >>> getSolrServer().deleteByQuery(****catalogueId + ":" + Emory Labs)
>> >>>  [Notice
>> >>>
>> >>> that
>> >>> there are no quotes surrounding the catalogueId value - Emory Labs]
>> >>>
>> >>> For some reason this delete by query ends up deleting the contents of
>> some
>> >>> other random catalogues too which is the reason why we are loosing
>> items
>> >>> from the index. When the query is changed to:
>> >>>
>> >>> getSolrServer().deleteByQuery(****catalogueId + ":" + "Emory Labs"),
>> >>> then it
>> >>>
>> >>> starts to correctly delete only items in the Emory Labs catalogue.
>> >>>
>> >>> So my first question is, what exactly does deleteByQuery do in the
>> first
>> >>> query without the quotes? How is it determining which catalogues to
>> >>> delete?
>> >>>
>> >>> Secondly, shouldn't the correct behaviour be not to delete anything
>> at all
>> >>> in this case since when a search is done for the same catalogueId
>> without
>> >>> the quotes it just simply returns no results?
>> >>>
>> >>> Thanks.
>> >>>
>> >>>
>> >>> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue <kissue...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>  Hi Erick,
>> >>>
>> >>>>
>> >>>> Thanks for your reply. Yes i am using delete by query. I am currently
>> >>>> logging the number of items to be deleted before handing off to
>> solr. And
>> >>>> from solr logs i can it deleted exactly that number. I will verify
>> >>>> further.
>> >>>>
>> >>>> Thanks.
>> >>>>
>> >>>>
>> >>>> On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson <
>> erickerick...@gmail.com
>> >>>> >
>> >>>> **wrote:
>> >>>>
>> >>>>
>> >>>>  How do you delete items? By ID or by query?
>> >>>>
>> >>>>>
>> >>>>> My guess is that one of two things is happening:
>> >>>>> 1> your delete process is deleting too much data.
>> >>>>> 2> your index process isn't indexing what you think.
>> >>>>>
>> >>>>> I'd add some logging to the SolrJ program to see what
>> >>>>> it thinks is has deleted or added to the index and go from there.
>> >>>>>
>> >>>>> Best
>> >>>>> Erick
>> >>>>>
>> >>>>> On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue <kissue...@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >
>> >>>>> > I am running Solr 3.5, using SolrJ and using
>> StreamingUpdateSolrServer
>> >>>>> to
>> >>>>> > index and delete items from solr.
>> >>>>> >
>> >>>>> > I basically index items from the db into solr every night.
>> Existing
>> >>>>> items
>> >>>>> > can be marked for deletion in the db and a delete request sent to
>> solr
>> >>>>> to
>> >>>>> > delete such items.
>> >>>>> >
>> >>>>> > My process runs as follows every night:
>> >>>>> >
>> >>>>> > 1. Check if items have been marked for deletion and delete from
>> solr.
>> >>>>> > I
>> >>>>> > commit and optimize after the entire solr deletion runs.
>> >>>>> > 2. Index any new items to solr. I commit and optimize after all
>> the >
>> >>>>> new
>> >>>>> > items have been added.
>> >>>>> >
>> >>>>> > Recently i started noticing that huge chunks of items that have
>> not >
>> >>>>> been
>> >>>>> > marked for deletion are disappearing from the index. I checked
>> the >
>> >>>>> solr
>> >>>>> > logs and the logs indicate that it is deleting exactly the number
>> of
>> >>>>> items
>> >>>>> > requested but still a lot of other items disappear from the index
>> from
>> >>>>> time
>> >>>>> > to time. Any ideas what might be causing this or what i am doing >
>> >>>>> wrong.
>> >>>>> >
>> >>>>> >
>> >>>>> > Thanks.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>>
>
>

Reply via email to