Re: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown

Grant Ingersoll Thu, 02 Apr 2009 05:32:30 -0700


On Apr 2, 2009, at 4:02 AM, Fergus McMenemie wrote:

Grant,


Hmmm, the big difference is made by &overwrite=false. But,
can you explain why &overwrite=false makes such a difference.
I am starting off with an empty index and I have checked the
content there are no duplicates in the uniqueKey field.

I guess if &overwrite=false then a few checks can be removed
from the indexing process, and if I am confident that my content
contains no duplicates then this is a good speed up.

http://wiki.apache.org/solr/UpdateCSV says that if overwrite
is true (the default) then overwrite documents based on the
uniqueKey. However what will solr/lucene do if the uniqueKey
is not unique and overwrite=false?

overwrite=false means Solr does not issue deletes first, meaning ifyou have a doc w/ that id already, you will now have two docs withthat id. unique Id is enforced by Solr, not by Lucene.

Even if you can't guarantee uniqueness, you can still do overwrite =false as a workaround using the suggestion I gave you in a prior email:1. Add a new field that is unique for your data source, but is thesame for all records in that data source. i.e. type = geonames.txt2. Before updating, issue a delete by query for the value of thattype, which will delete all records with that term

3. Do your indexing with overwrite = false

I should note, however, that the speed difference you are seeing maynot be as pronounced as it appears. If I recall during ApacheCon, Icommented on how long it takes to shutdown your Solr instance whenexiting it. That time it takes is in fact Solr doing the work thatwas put off by not committing earlier and having all those deletespile up.

Thus, while it is likely that your older version is still faster dueto the new fsync stuff in Lucene, it may not be that much faster. Ithink you could see this by actually doing commit = true, but I'm not100% sure.

fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | wc -l
1000000
fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | sort -u |wc -l
1000000
fergus: /usr/bin/head geonames.txt
RC UFI UNI LAT LONG DMS_LAT DMS_LONG MGRS JOG FC DSG PC CC1 ADM1ADM2 POP ELEV CC2 NT LC SHORT_FORM GENERIC SORT_NAME FULL_NAMEFULL_NAME_ND MODIFY_DATE1 -1307828 60524 12.466667 -69.9 122800 -695400 19PDP0219578323ND19-14 T MT AA 00 PALUMARGA Palu Marga Palu Marga 1995-03-231 -1307756 -1891720 12.5 -70.016667 123000 -700100 19PCP8952982056ND19-14 P PPLX
PS. do you want me to do some kind of chop through the
different versions to see where the slow down happened
or are you happy you have nailed it?    
--

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown

Reply via email to