DIH does not maintain any state between two runs. So if there is a perf degradation it could be because - Solr Indexing is taking longer after you do a delete *:* - Your RAM is insufficient (your machine is swapping)
On Fri, Dec 19, 2008 at 2:51 AM, Glen Newton <glen.new...@gmail.com> wrote: > Hello, > > I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9 > > I am indexing 500k records using the JDBC Data Import Request Handler. > > Config: > Linux openSUSE 10.2 (X86-64) > Dual core dual core 64bit Xeon 3GHz Dell blade 8GB RAM > java version "1.6.0_07" > Java(TM) SE Runtime Environment (build 1.6.0_07-b06) > Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) > 1GB heap for Tomcat > DB: MySql on separate but similar server > > I am finding that the when I do a Full-Import, followed by another > Full-import the import takes much longer the second and subsequent > times: > Run1 = 0:27:31.491 > Run2 = 1:14:44:821 > Run3 = 1:14:48.316 > Run4 = 2:15:12.296 > Run5 = 1:37:6.847 > > (I have run this ~10 times and got roughly the same results). I have > also monitored the load on the Solr machine and the databases machine > for any other activity that might impact. > > The final Lucene index size is 923MB. The default clean = 'true', so > the index is cleared (emptied) each time, so I am concerned the second > run takes 4 times the time of the first run. > > Am I doing something wrong here? Any help would be appreciated. > > I have append my data-config.xml > > thanks, > > Glen > > <dataConfig> > <dataSource driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://blue01/dartejos" user="USER" password="PASSWD"/> > <document name="products"> > <entity name="item" query="select Publisher.name as pub, > Journal.title as jo, Article.rawUrl as textpath, Journal.issn, > Volume.number as vol,Volume.coverYear as year, Issue.number as iss, > Article.id,Article.title as ti, Article.abstract, Article.startPage as > startPage,Article.endPage as endPage from Publisher, Journal, Volume, > Issue, Article where Publisher.id = Journal.publisherId and Journal.id > = Volume.journalId and Volume.id = Issue.volumeId and Issue.id = > Article.issueId limit 500000"> > <field column="id" name="id" /> > <field column="jo" name="id" /> > <field column="issn" name="id" /> > <field column="vol" name="id" /> > <field column="year" name="id" /> > <field column="iss" name="id" /> > <field name="abstract" column="abstract"/> > <field name="title" column="title"/> > <field name="pub" column="pub"/> > <field name="textpath" column="textpath"/> > <field name="startPage" column="startPage"/> > <field name="endPage" column="endPage"/> > </entity> > </document> > </dataConfig> > > -- > > - > -- --Noble Paul