On 9/25/2015 10:10 PM, Ravi Solr wrote: > thank you for taking time to help me out. Yes I was not using cursorMark, I > will try that next. This is what I was doing, its a bit shabby coding but > what can I say my brain was fried :-) FYI this is a side process just to > correct a messed up string. The actual indexing process was working all the > time as our business owners are a bit petulant about stopping indexing. My > autocommit conf and code is given below, as you can see autocommit should > fire every 100 docs anyway
It took a while, but I finally managed to see how this would page through the docs. You are filtering on the text that you are removing. This would indeed require that the previous changes are committed before going through the loop again. Switching to cursorMark is probably not necessary, if you optimize your query and your commits. My advice incorporates some of what Erick said, and some ideas of my own: I think you should remove autoSoftCommit, and set autoCommit to a maxTime of 300000 (five minutes) and do not include maxDocs. <autoCommit> <maxTime>300000</maxTime> </autoCommit> Remove the 5 second sleep from the code. I would also increase the number of documents for each loop beyond 100 ... to a minimum of 1000, possibly more like 10000. The call to getDocs inside the loop should not use the size of the previous result, it should use the number of docs you define for the loop. After the "add" call in your processDocs method, you should send a soft commit, so the code looks like this: client.add(inList); client.commit(true, true, true); The autoCommit will ensure your transaction log never gets very large, and the soft commit in your code will take care of change visibility as quickly as possible. You might find that some loops take longer than five seconds, but it should work. You need to remove the "uuid:[* TO *]" filter. This is doing unnecessary (and fairly slow) work on the server side -- the other filter will ensure that the results would match the range filter, so the range filter is not necessary. I assume that you have tried out the query manually so that you know it actually works? I'm guessing that uuid is a StrField, not an actual UUID type. I'm reasonably certain that if it were a UUID type, it would not have accepted the class name that you are trying to remove. What is your uniqueKey field? I hope it's not uuid. I think that you would not get the results you want if that were the case. Your code excerpt hints that the uniqueKey is another field. I pulled your code into a new eclipse project and made the recommended changes, plus a few other very small modifications. The results are here: http://apaste.info/w48 I had no context for the "systemid" variable, so I defined it to get rid of the compiler error. It is only used for logging. I also had to define the "log" variable to get the code to validate, which I think you've already done in your own class, so that can be removed from my workup. The code is formatted to my company's standard formatting, which probably doesn't match your own standard. Something I just noticed: You could probably remove the sort from the query, which might reduce the amount of memory used on the Solr server and make everything generally faster. If the modified code runs into problems, there might be a serious issue on the server side your Solr install. Thanks, Shawn