Hello,

I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9

I am indexing 500k records using the JDBC Data Import Request Handler.

Config:
 Linux openSUSE 10.2 (X86-64)
 Dual core dual core 64bit Xeon 3GHz Dell blade  8GB RAM
 java version "1.6.0_07"
 Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
 Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
 1GB heap for Tomcat
 DB: MySql on separate but similar server

I am finding that the when I do a Full-Import, followed by another
Full-import the import takes much longer the second and subsequent
times:
Run1 = 0:27:31.491
Run2 = 1:14:44:821
Run3 = 1:14:48.316
Run4 = 2:15:12.296
Run5 = 1:37:6.847

(I have run this ~10 times and got roughly the same results). I have
also monitored the load on the Solr machine and the databases machine
for any other activity that might impact.

The final Lucene index size is 923MB. The default clean = 'true', so
the index is cleared (emptied) each time, so I am concerned the second
run takes 4 times the time of the first run.

Am I doing something wrong here? Any help would be appreciated.

I have append my data-config.xml

thanks,

Glen

<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://blue01/dartejos" user="USER" password="PASSWD"/>
    <document name="products">
        <entity name="item" query="select  Publisher.name as pub,
Journal.title as jo, Article.rawUrl as textpath, Journal.issn,
Volume.number as vol,Volume.coverYear as year, Issue.number as iss,
Article.id,Article.title as ti, Article.abstract, Article.startPage as
startPage,Article.endPage as endPage from Publisher, Journal, Volume,
Issue, Article where Publisher.id = Journal.publisherId and Journal.id
= Volume.journalId and Volume.id = Issue.volumeId and Issue.id =
Article.issueId  limit 500000">
            <field column="id" name="id" />
            <field column="jo" name="id" />
            <field column="issn" name="id" />
            <field column="vol" name="id" />
            <field column="year" name="id" />
            <field column="iss" name="id" />
            <field name="abstract" column="abstract"/>
            <field name="title" column="title"/>
            <field name="pub" column="pub"/>
            <field name="textpath" column="textpath"/>
            <field name="startPage" column="startPage"/>
            <field name="endPage" column="endPage"/>
        </entity>
    </document>
</dataConfig>

-- 

-
  • Data Import Request Ha... Glen Newton

Reply via email to