bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using
100 docs batch, which, I later increased to 500 docs per batch. Also it
would not be a infinite loop if I commit for each batch, right !!??

That's not the point at all. Look at the basic logic here:

You run for a while processing 100 (or 500 or 1,000) docs per batch
and change all uuid fields with this statement:

uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");

and then update the doc. You run this as long as you have any docs
that satisfy the query "q=uuid:sun.org.mozilla*", _changing_
every one that has this string!

At that point, theoretically, no document in your index has this string. So
running your update program immediately after should find _zero_ documents.

I've been assuming your complaint is that you don't process 1.4 M docs (in
batches), you process some lower number then exit and you think this is wrong.
I'm claiming that you should only expect to find as many docs as have been
indexed since the last time the program ran.

As far as the infinite loop is concerned, again trace the logic in the old code.
Forget about commits and all the mechanics, just look at the logic.
You're querying on "sun.org.mozilla*". But you only change if you get a match on
"sun.org.mozilla.javascript.internal.NativeString:"

Now imagine you have a doc that has sun.org.mozilla.erick in it. That doc gets
returned from the query but does _not_ get modified because it doesn't
match your pattern. In the older code, it would be found again and returned next
time you queried. Then not modified again. Eventually you'd be in a position
where you never changed any docs, just kept getting the same docList back
over and over again. Marching through based on the unique key should not
have the same potential issue.

You should not be mixing the new query stuff with CURSORMARK. Deep paging
supposes the exact same query is being run over and over and you're _paging_
through the results. You're changing the query every time so the results aren't
very predictable.

Best,
Erick


On Sat, Sep 26, 2015 at 5:01 PM, Ravi Solr <ravis...@gmail.com> wrote:
> Erick & Shawn I incrporated your suggestions.
>
>
> 0. Shut off all other indexing processes.
> 1. As Shawn mentioned set batch size to 10000.
> 2. Loved Erick's suggestion about not using filter at all and sort by
> uniqueId and put last known uinqueId as next queries start while still
> using cursor marks as follows
>
> SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" +
> markerSysId + " TO
> *]").setRows(10000).addSort("uniqueId",ORDER.asc).setFields(new
> String[]{"uniqueId","uuid"});
> q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
>
> 3. As per Shawn's advise commented autocommit and soft commit in
> solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for
> every batch from code as follows
>
> client.commit(true, true, true);
>
> Here is what the log statement & results - log.info("Indexed " + count +
> "/" + docList.getNumFound());
>
>
> 2015-09-26 17:29:57 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 90000/1344085
> 2015-09-26 17:30:30 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 100000/1334085
> 2015-09-26 17:33:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 110000/1324085
> 2015-09-26 17:36:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 120000/1314085
> 2015-09-26 17:39:42 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 130000/1304085
> 2015-09-26 17:43:05 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 140000/1294085
> 2015-09-26 17:46:14 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 150000/1284085
> 2015-09-26 17:48:22 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 160000/1274085
> 2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 160000/0
> 2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!
>
> Ran manually a second time to see if first was fluke. Still same.
>
> 2015-09-26 17:55:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 10000/1264716
> 2015-09-26 17:58:07 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 20000/1254716
> 2015-09-26 18:03:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 30000/1244716
> 2015-09-26 18:06:32 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 40000/1234716
> 2015-09-26 18:10:35 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 50000/1224716
> 2015-09-26 18:15:23 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 60000/1214716
> 2015-09-26 18:15:24 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 60000/0
> 2015-09-26 18:15:26 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!
>
> Now changed the autommit in solrconfig.xml as follows...Note the soft
> commit has been shut off as per Shawn's advise
>
>     <autoCommit>
>        <!-- <maxDocs>100</maxDocs> -->
>        <maxTime>300000</maxTime>
>      <openSearcher>false</openSearcher>
>     </autoCommit>
>
>   <!--
>     <autoSoftCommit>
>         <maxTime>30000</maxTime>
>     </autoSoftCommit>
>   -->
>
> 2015-09-26 18:47:44 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
> Indexed 10000/1205451
> 2015-09-26 18:50:49 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
> Indexed 20000/1195451
> 2015-09-26 18:54:18 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
> Indexed 30000/1185451
> 2015-09-26 18:57:04 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
> Indexed 40000/1175451
> 2015-09-26 19:00:10 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
> Indexed 50000/1165451
> 2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
> Indexed 50000/0
> 2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
> FINISHED !!!
>
>
> The query still returned 0 results when they are over million docs
> available which match uuid:sun.org.mozilla* ...Then why do I get 0 ???
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Sat, Sep 26, 2015 at 3:49 PM, Ravi Solr <ravis...@gmail.com> wrote:
>
>> Thank you Erick & Shawn for taking significant time off your weekends to
>> debug and explain in great detail. I will try to address the main points
>> from your emails to provide more situation context for better understanding
>> of my situation
>>
>> 1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs
>> from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor
>> which used a Script Transformer. I unwittingly messed up the script and
>> hence this 'uuid' (String Type field) got messed up. All records prior to
>> Sep 20 2015 have this issue that I am currently try to rectify.
>>
>> 2. Regarding openSearcher=true/false, I had it as false all along in my
>> 4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it
>> should be left default (Don't exactly remember where I read it), hence, I
>> removed it from my solrconfig.xml going against my intuition :-)
>>
>> 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using
>> 100 docs batch, which, I later increased to 500 docs per batch. Also it
>> would not be a infinite loop if I commit for each batch, right !!??
>>
>> 4. Shawn, you are correct the uuid is of String Type and its not unique
>> key for my schema. My uniqueKey is uniqueId and systemid is of no
>> consequence here, it's another field for differentiating apps within my
>> solr.
>>
>> Than you very much again guys. I will incorporate your suggestions and
>> report back.
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>
>> On Sat, Sep 26, 2015 at 12:58 PM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> Oh, one more thing. _assuming_ you can't change the indexing process
>>> that gets the docs from the system of record, why not just add an
>>> update processor that does this at index time? See:
>>> https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
>>> ,
>>> in particular the StatelessScriptUpdateProcessorFactory might be a
>>> good candidate. It just takes a bit of javascript (or other scripting
>>> language) and changes the record before it gets indexed.
>>>
>>> FWIW,
>>> Erick
>>>
>>> On Sat, Sep 26, 2015 at 9:52 AM, Shawn Heisey <apa...@elyograg.org>
>>> wrote:
>>> > On 9/26/2015 10:41 AM, Shawn Heisey wrote:
>>> >> <autoCommit> <maxTime>300000</maxTime> </autoCommit>
>>> >
>>> > This needs to include openSearcher=false, as Erick mentioned.  I'm sorry
>>> > I screwed that up:
>>> >
>>> >   <autoCommit>
>>> >     <maxTime>300000</maxTime>
>>> >     <openSearcher>false</openSearcher>
>>> >   </autoCommit>
>>> >
>>> > Thanks,
>>> > Shawn
>>>
>>
>>

Reply via email to