DIH does not maintain any state between two runs. So if there is a
perf degradation
it could be because
- Solr Indexing is taking longer after you do a delete *:*
- Your RAM is insufficient (your machine is swapping)

On Fri, Dec 19, 2008 at 2:51 AM, Glen Newton <glen.new...@gmail.com> wrote:
> Hello,
>
> I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9
>
> I am indexing 500k records using the JDBC Data Import Request Handler.
>
> Config:
>  Linux openSUSE 10.2 (X86-64)
>  Dual core dual core 64bit Xeon 3GHz Dell blade  8GB RAM
>  java version "1.6.0_07"
>  Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
>  Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
>  1GB heap for Tomcat
>  DB: MySql on separate but similar server
>
> I am finding that the when I do a Full-Import, followed by another
> Full-import the import takes much longer the second and subsequent
> times:
> Run1 = 0:27:31.491
> Run2 = 1:14:44:821
> Run3 = 1:14:48.316
> Run4 = 2:15:12.296
> Run5 = 1:37:6.847
>
> (I have run this ~10 times and got roughly the same results). I have
> also monitored the load on the Solr machine and the databases machine
> for any other activity that might impact.
>
> The final Lucene index size is 923MB. The default clean = 'true', so
> the index is cleared (emptied) each time, so I am concerned the second
> run takes 4 times the time of the first run.
>
> Am I doing something wrong here? Any help would be appreciated.
>
> I have append my data-config.xml
>
> thanks,
>
> Glen
>
> <dataConfig>
> <dataSource driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://blue01/dartejos" user="USER" password="PASSWD"/>
>    <document name="products">
>        <entity name="item" query="select  Publisher.name as pub,
> Journal.title as jo, Article.rawUrl as textpath, Journal.issn,
> Volume.number as vol,Volume.coverYear as year, Issue.number as iss,
> Article.id,Article.title as ti, Article.abstract, Article.startPage as
> startPage,Article.endPage as endPage from Publisher, Journal, Volume,
> Issue, Article where Publisher.id = Journal.publisherId and Journal.id
> = Volume.journalId and Volume.id = Issue.volumeId and Issue.id =
> Article.issueId  limit 500000">
>            <field column="id" name="id" />
>            <field column="jo" name="id" />
>            <field column="issn" name="id" />
>            <field column="vol" name="id" />
>            <field column="year" name="id" />
>            <field column="iss" name="id" />
>            <field name="abstract" column="abstract"/>
>            <field name="title" column="title"/>
>            <field name="pub" column="pub"/>
>            <field name="textpath" column="textpath"/>
>            <field name="startPage" column="startPage"/>
>            <field name="endPage" column="endPage"/>
>        </entity>
>    </document>
> </dataConfig>
>
> --
>
> -
>



-- 
--Noble Paul

Reply via email to