Try running the query you're using in DIH from command line on the DB host and 
on the solr host to see what kind of times you get from the DB itself and from 
the network, you're bottleneck might be there.  If you find that's not it, take 
a look at this post regarding high performance DIH imports, you can get serious 
improvement in performance by not using sub-entities 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e
 

Ephraim Ofir

-----Original Message-----
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Saturday, October 09, 2010 10:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Speeding up solr indexing

Looking at it, and now knowing how much memory your other processes on your box 
use (nor how much memory you have set aside for Java), I would start with 
DOUBLING your ram. Make sure that you have enough Java memory.

You will know if it has some effect by using the 2:1 size ratio. 100mb for all 
that data ia pretty small, I think.


Use the scientific method; Change only one parameter at a time and check 
results.

It's always on of four things:
(in different order depending on task, but listed alphabetically here)
------------------------------
Memory (process assigned and/or actual physical memory)
Processor
Network Bandwidth
Hard Drive Bandwidth
(sometimes you can add motherboard I/O paths also.
 as of this date, AMD has much more I/O paths in their
 consumer line of processors.)

In order ease of experimenting with(Easiest to hardest):
-----------------------------------
Appication/process assigned memory
Physical memory
Network Bandwidth
HardDrive Bandwidth
  Screaming fast SCSI 15K rpm drives
  RAID arrays, casual
  RAID arrays, professional
  External DRAM drive 64 gig max/RAID them for more
Processor(s) 
  Put maximum speed/cache size motherboard will take.
  Otherwise, USUALLY requires changing motherboard/HOSTING setup
I/O channels
  USUALLY requires changing motherboard/HOSTING setup





Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/9/10, sivaprasad <sivaprasa...@echidnainc.com> wrote:

> From: sivaprasad <sivaprasa...@echidnainc.com>
> Subject: Re: Speeding up solr indexing
> To: solr-user@lucene.apache.org
> Date: Saturday, October 9, 2010, 8:09 AM
> 
> Hi,
> Please find the configurations below.
> 
> Machine configurations(Solr running here):
> 
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> 
> 
> Machine configurations(Mysql server is running here):
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> My sql Server deatils:
> My sql version - Mysql 5.0.22
> 
> Solr configuration details:
> 
>  <indexDefaults>
>   
>    
> <useCompoundFile>false</useCompoundFile>
> 
>     <mergeFactor>20</mergeFactor>
>    
>    
> <!--<maxBufferedDocs>1000</maxBufferedDocs>--> 
>   
>    
> <ramBufferSizeMB>100</ramBufferSizeMB>
>    
> <maxMergeDocs>2147483647</maxMergeDocs>
>    
> <maxFieldLength>10000</maxFieldLength>
>    
> <writeLockTimeout>1000</writeLockTimeout>
>    
> <commitLockTimeout>10000</commitLockTimeout>
>    
> <!--<luceneAutoCommit>false</luceneAutoCommit>-->
>    
>    
> <!--<mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>-->
>     
>    
> <!--<mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>-->
>     <lockType>single</lockType>
>   </indexDefaults>
> 
>   <mainIndex>
>     
>    
> <useCompoundFile>false</useCompoundFile>
>    
> <ramBufferSizeMB>100</ramBufferSizeMB>
>     <mergeFactor>20</mergeFactor>
>    
>    
> <!--<maxBufferedDocs>1000</maxBufferedDocs>-->
>    
> <maxMergeDocs>2147483647</maxMergeDocs>
>    
> <maxFieldLength>10000</maxFieldLength>
>    
> <unlockOnStartup>false</unlockOnStartup>
>   </mainIndex>
> 
>   <!-- the default high-performance update handler
> -->
>   <updateHandler
> class="solr.DirectUpdateHandler2">
>    
> <maxPendingDeletes>100000</maxPendingDeletes>
>     <autoCommit> 
>       <maxDocs>10000</maxDocs> 
>       <maxTime>60000</maxTime>
>     </autoCommit>
>     
>     <!-- A postCommit event is fired after
> every commit or optimize command
>     <listener event="postCommit"
> class="solr.RunExecutableListener">
>       <str
> name="exe">solr/bin/snapshooter</str>
>       <str name="dir">.</str>
>       <bool
> name="wait">true</bool>
>       <arr name="args">
> <str>arg1</str> <str>arg2</str>
> </arr>
>       <arr name="env">
> <str>MYVAR=val1</str> </arr>
>     </listener>
>     -->
>     <!-- A postOptimize event is fired only
> after every optimize command,
> useful
>          in conjunction with
> index distribution to only distribute optimized
> indicies 
>     <listener event="postOptimize"
> class="solr.RunExecutableListener">
>       <str
> name="exe">snapshooter</str>
>       <str
> name="dir">solr/bin</str>
>       <bool
> name="wait">true</bool>
>     </listener>
>     -->
>   </updateHandler>
> 
> Solr document details:
> 
> 21 fields are indexed and stored
> 3 fileds are indexed only.
> 3 fileds are stored only.
> 3 fileds are indexed,stored and multi valued
> 2 fileds indexed and multi valued
> 
> And i am copying some of the indexed fileds.In this 2
> fileds are multivalued
> and has thousands of values.
> 
> In db-config-file the main table contains 0.6 million
> records.
> 
> When i tested for the same records, the index has taken 1hr
> 30 min.In this
> case one of the multivalued filed table doesn't have
> records.After putting
> data into this table,for each main table record , this
> table has thousands
> of records and this filed is indexed and stored.It is
> taking more than 24
> hrs .
> 
> Solr is running on tomcat 6.0.26, jdk1.6.0_17 and solr
> 1.4.1
> 
> I am using JVM's default settings.
> 
> Why this is taking this much time?Any body has suggestions,
> where i am going
> wrong.
> 
> Thanks,
> JS
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1670737.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 

Reply via email to