Re: Unusually long data import time?

Walter Underwood Wed, 22 Feb 2012 09:53:36 -0800

In my first try with the DIH, I had several sub-entities and it was making six 
queries per document. My 20M doc load was going to take many hours, most of a 
day. I re-wrote it to eliminate those, and now it makes a single query for the 
whole load and takes 70 minutes. These are small documents, just the metadata 
for each book.


wunder
Search Guy
Chegg

On Feb 22, 2012, at 9:41 AM, Devon Baumgarten wrote:

> I changed the heap size (Xmx1582m was as high as I could go). The import is 
> at about 5% now, and from that I now estimate about 13 hours. It's hard to 
> say though.. it keeps going up little by little.
> 
> If I get approval to use Solr for this project, I'll have them install a 
> 64bit jvm instead, but is there anything else I can do?
> 
> 
> Devon Baumgarten
> Application Developer
> 
> 
> -----Original Message-----
> From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] 
> Sent: Wednesday, February 22, 2012 10:32 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Unusually long data import time?
> 
> Oh sure! As best as I can, anyway.
> 
> I have not set the Java heap size, or really configured it at all. 
> 
> The server running both the SQL Server and Solr has:
> * 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors)
> * 64 GB RAM
> * One Solr instance (no shards)
> 
> I'm not using faceting.
> My schema has these fields:
>  <field name="Id" type="string" indexed="true" stored="true" /> 
>  <field name="RecordId" type="int" indexed="true" stored="true" /> 
>  <field name="RecordType" type="string" indexed="true" stored="true" /> 
>  <field name="Name" type="LikeText" indexed="true" stored="true" 
> termVectors="true" /> 
>  <field name="NameFuzzy" type="FuzzyText" indexed="true" stored="true" 
> termVectors="true" /> 
>  <copyField source="Name" dest="NameFuzzy" /> 
>  <field name="NameType" type="string" indexed="true" stored="true" />
> 
> Custom types:
> 
> *LikeText
>       PatternReplaceCharFilterFactory ("\W+" => "")
>       KeywordTokenizerFactory 
>       StopFilterFactory (~40 words in stoplist)
>       ASCIIFoldingFilterFactory
>       LowerCaseFilterFactory
>       EdgeNGramFilterFactory
>       LengthFilterFactory (min:3, max:512)
> 
> *FuzzyText
>       PatternReplaceCharFilterFactory ("\W+" => "")
>       KeywordTokenizerFactory 
>       StopFilterFactory (~40 words in stoplist)
>       ASCIIFoldingFilterFactory
>       LowerCaseFilterFactory
>       NGramFilterFactory
>       LengthFilterFactory (min:3, max:512)
> 
> Devon Baumgarten
> 
> 
> -----Original Message-----
> From: Glen Newton [mailto:glen.new...@gmail.com] 
> Sent: Wednesday, February 22, 2012 9:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Unusually long data import time?
> 
> Import times will depend on:
> - hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
> - Java configuration (heap size, etc)
> - Lucene/Solr configuration (many ...)
> - Index configuration - how many fields, indexed how; faceting, etc
> - OS configuration (this usually to a lesser degree; _usually_)
> - Network issues if non-local
> - DB configuration (driver, etc)
> 
> If you can give more information about the above, people on this list
> should be able to better indicate whether 18 hours sounds right for
> your situation.
> 
> -Glen Newton
> 
> On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
> <dbaumgar...@nationalcorp.com> wrote:
>> Hello,
>> 
>> Would it be unusual for an import of 160 million documents to take 18 hours? 
>>  Each document is less than 1kb and I have the DataImportHandler using the 
>> jdbc driver to connect to SQL Server 2008. The full-import query calls a 
>> stored procedure that contains only a select from my target table.
>> 
>> Is there any way I can speed this up? I saw recently someone on this list 
>> suggested a new user could get all their Solr data imported in under an 
>> hour. I sure hope that's true!
>> 
>> 
>> Devon Baumgarten
>> 
>> 
> 
> 
> 
> -- 
> -
> http://zzzoot.blogspot.com/
> -

Re: Unusually long data import time?

Reply via email to