Hi,

I am new to solr and we want to use Solr to speed up our product search.
And it is working really nice, but I think I have a problem with the indexing.
It slows down after a few minutes.

I am using the DataImportHandler to import the products from the database.
And I start the import by executing the following HTTP request:
/dataimport?command=full-import&clean=true&commit=true

I guess this are the importend parts of my configuration:

schema.xml:
----------------------------------------------
<fields>
   <field name="pk"               type="long"        indexed="true"  
stored="true" required="true"  />
   <field name="code"             type="string"      indexed="true"  
stored="true" required="true"  /> 
   <field name="ean"              type="string"      indexed="true"  
stored="false"  />
   <field name="name"             type="lowercase"   indexed="true"  
stored="false"  />
   <field name="text" type="text_general" indexed="true" stored="false" 
multiValued="true"/>
   <field name="_version_" type="long" indexed="true" stored="true"/>
</fields>
....
    <fieldType name="lowercase" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>
----------------------------------------------

solrconfig.xml:
----------------------------------------------
  <requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
        <str name="config">dataimport-handler.xml</str>
    </lst>
  </requestHandler>
----------------------------------------------

dataimport-handler.xml:
----------------------------------------------
<dataConfig>
    <dataSource name="local" driver="="*************" " 
                url="*************" 
                user="*************" "
                password="*************" 
                />
   <document>
            <entity name="product" pk="PRODUCTS_PK" dataSource="local"
                        query="SELECT   PRODUCTS_PK, PRODUCTS_CODE, 
PRODUCTS_EAN, PRODUCTSLP_NAME FROM V_SOLR_IMPORT4PRODUCT_SEARCH">
            <field column="PRODUCTS_PK"       name="pk" />
            <field column="PRODUCTS_CODE"     name="code" />
            <field column="PRODUCTS_EAN"      name="ean" />
            <field column="PRODUCTSLP_NAME"   name="name" />
        </entity>
    </document>
</dataConfig>
----------------------------------------------

The amout of documents I want to index is 8 million, the first 1,6 million are 
indexed in 2min, but to complete the Import it takes nearly 2 hours.
The size of the index on the hard drive is 610MB.
I started the solr server with 2GB memory.


I read that the duration of indexing might be connected to the batch size, so I 
increased the batchSize in the dataSource to 10.000, but this didn't make any 
differences.
I also tried to disable the autocommit, which is configured in the 
solrconfig.xml. I disabled it by uncommenting it, but this also didn't made any 
differences.

It would be realy nice if someone of you could help me with this problem.

Thank you very much,
Sebastian

Reply via email to