In my first try with the DIH, I had several sub-entities and it was making six queries per document. My 20M doc load was going to take many hours, most of a day. I re-wrote it to eliminate those, and now it makes a single query for the whole load and takes 70 minutes. These are small documents, just the metadata for each book.
wunder Search Guy Chegg On Feb 22, 2012, at 9:41 AM, Devon Baumgarten wrote: > I changed the heap size (Xmx1582m was as high as I could go). The import is > at about 5% now, and from that I now estimate about 13 hours. It's hard to > say though.. it keeps going up little by little. > > If I get approval to use Solr for this project, I'll have them install a > 64bit jvm instead, but is there anything else I can do? > > > Devon Baumgarten > Application Developer > > > -----Original Message----- > From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] > Sent: Wednesday, February 22, 2012 10:32 AM > To: 'solr-user@lucene.apache.org' > Subject: RE: Unusually long data import time? > > Oh sure! As best as I can, anyway. > > I have not set the Java heap size, or really configured it at all. > > The server running both the SQL Server and Solr has: > * 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors) > * 64 GB RAM > * One Solr instance (no shards) > > I'm not using faceting. > My schema has these fields: > <field name="Id" type="string" indexed="true" stored="true" /> > <field name="RecordId" type="int" indexed="true" stored="true" /> > <field name="RecordType" type="string" indexed="true" stored="true" /> > <field name="Name" type="LikeText" indexed="true" stored="true" > termVectors="true" /> > <field name="NameFuzzy" type="FuzzyText" indexed="true" stored="true" > termVectors="true" /> > <copyField source="Name" dest="NameFuzzy" /> > <field name="NameType" type="string" indexed="true" stored="true" /> > > Custom types: > > *LikeText > PatternReplaceCharFilterFactory ("\W+" => "") > KeywordTokenizerFactory > StopFilterFactory (~40 words in stoplist) > ASCIIFoldingFilterFactory > LowerCaseFilterFactory > EdgeNGramFilterFactory > LengthFilterFactory (min:3, max:512) > > *FuzzyText > PatternReplaceCharFilterFactory ("\W+" => "") > KeywordTokenizerFactory > StopFilterFactory (~40 words in stoplist) > ASCIIFoldingFilterFactory > LowerCaseFilterFactory > NGramFilterFactory > LengthFilterFactory (min:3, max:512) > > Devon Baumgarten > > > -----Original Message----- > From: Glen Newton [mailto:glen.new...@gmail.com] > Sent: Wednesday, February 22, 2012 9:24 AM > To: solr-user@lucene.apache.org > Subject: Re: Unusually long data import time? > > Import times will depend on: > - hardware (speed of disks, cpu, # of cpus, amount of memory, etc) > - Java configuration (heap size, etc) > - Lucene/Solr configuration (many ...) > - Index configuration - how many fields, indexed how; faceting, etc > - OS configuration (this usually to a lesser degree; _usually_) > - Network issues if non-local > - DB configuration (driver, etc) > > If you can give more information about the above, people on this list > should be able to better indicate whether 18 hours sounds right for > your situation. > > -Glen Newton > > On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten > <dbaumgar...@nationalcorp.com> wrote: >> Hello, >> >> Would it be unusual for an import of 160 million documents to take 18 hours? >> Each document is less than 1kb and I have the DataImportHandler using the >> jdbc driver to connect to SQL Server 2008. The full-import query calls a >> stored procedure that contains only a select from my target table. >> >> Is there any way I can speed this up? I saw recently someone on this list >> suggested a new user could get all their Solr data imported in under an >> hour. I sure hope that's true! >> >> >> Devon Baumgarten >> >> > > > > -- > - > http://zzzoot.blogspot.com/ > -