Re: bulk reindexing 5.3.0 issue

Erick Erickson Fri, 25 Sep 2015 15:33:06 -0700

How are you querying Solr? You say you query for 100 docs,
update then get the next set. What are you using for a marker?
If you're using the start parameter, and somehow a commit is
creeping in things might be weird, especially if you're using any
of the internal Lucene doc IDs. If you're absolutely sure no commits
are taking place even that should be OK.


The "deep paging" stuff could be helpful here, see:
https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Best,
Erick

On Fri, Sep 25, 2015 at 3:13 PM, Ravi Solr <ravis...@gmail.com> wrote:
> No problem Walter, it's all fun. Was just wondering if there was some other
> good way that I did not know of, that's all 😀
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Friday, September 25, 2015, Walter Underwood <wun...@wunderwood.org>
> wrote:
>
>> Sorry, I did not mean to be rude. The original question did not say that
>> you don’t have the docs outside of Solr. Some people jump to the advanced
>> features and miss the simple ones.
>>
>> It might be faster to fetch all the docs from Solr and save them in files.
>> Then modify them. Then reload all of them. No guarantee, but it is worth a
>> try.
>>
>> Good luck.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org <javascript:;>
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com
>> <javascript:;>> wrote:
>> >
>> > Walter, Not in a mood for banter right now.... Its 6:00pm on a friday and
>> > Iam stuck here trying to figure reindexing issues :-)
>> > I dont have source of docs so I have to query the SOLR, modify and put it
>> > back and that is seeming to be quite a task in 5.3.0, I did reindex
>> several
>> > times with 4.7.2 in a master slave env without any issue. Since then we
>> > have moved to cloud and it has been a pain all day.
>> >
>> > Thanks
>> >
>> > Ravi Kiran Bhaskar
>> >
>> > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <wun...@wunderwood.org
>> <javascript:;>>
>> > wrote:
>> >
>> >> Sure.
>> >>
>> >> 1. Delete all the docs (no commit).
>> >> 2. Add all the docs (no commit).
>> >> 3. Commit.
>> >>
>> >> wunder
>> >> Walter Underwood
>> >> wun...@wunderwood.org <javascript:;>
>> >> http://observer.wunderwood.org/  (my blog)
>> >>
>> >>
>> >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com
>> <javascript:;>> wrote:
>> >>>
>> >>> I have been trying to re-index the docs (about 1.5 million) as one of
>> the
>> >>> field needed part of string value removed (accidentally introduced). I
>> >> was
>> >>> issuing a query for 100 docs getting 4 fields and updating the doc
>> >> (atomic
>> >>> update with "set") via the CloudSolrClient in batches, However from
>> time
>> >> to
>> >>> time the query returns 0 results, which exits the re-indexing program.
>> >>>
>> >>> I cant understand as to why the cloud returns 0 results when there are
>> >> 1.4x
>> >>> million docs which have the "accidental" string in them.
>> >>>
>> >>> Is there another way to do bulk massive updates ?
>> >>>
>> >>> Thanks
>> >>>
>> >>> Ravi Kiran Bhaskar
>> >>
>> >>
>>
>>

Re: bulk reindexing 5.3.0 issue

Reply via email to