Re: Dataimport performance

2018-06-07 Thread Shawn Heisey
On 6/7/2018 12:19 AM, kotekaman wrote: sorry. may i know how to code it? Code *what*? Here's the same wiki page that I gave you for your last message: https://wiki.apache.org/solr/UsingMailingLists Even if I go to the Nabble website and discover that you've replied to a topic that's SEVEN

Re: Dataimport performance

2018-06-07 Thread kotekaman
sorry. may i know how to code it? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Dataimport performance

2010-12-19 Thread Alexey Serba
With subquery and with left join:   320k in 6 Min 30 It's 820 records per second. It's _really_ impressive considering the fact that DIH performs separate sql query for every record in your case. So there's one track entity with an artist sub-entity. My (admittedly rather limited) experience

Re: Dataimport performance

2010-12-19 Thread Lukas Kahwe Smith
On 19.12.2010, at 23:30, Alexey Serba wrote: Also Ephraim proposed a really neat solution with GROUP_CONCAT, but I'm not sure that all RDBMS-es support that. Thats MySQL only syntax. But if you google you can find similar solution for other RDBMS. regards, Lukas Kahwe Smith

RE: Dataimport performance

2010-12-16 Thread Ephraim Ofir
[mailto:rob...@dubture.com] Sent: Wednesday, December 15, 2010 4:49 PM To: solr-user@lucene.apache.org Subject: Re: Dataimport performance i've benchmarked the import already with 500k records, one time without the artists subquery, and one time without the join in the main query: Without subquery

RE: Dataimport performance

2010-12-16 Thread Dyer, James
-Original Message- From: Ephraim Ofir [mailto:ephra...@icq.com] Sent: Thursday, December 16, 2010 3:04 AM To: solr-user@lucene.apache.org Subject: RE: Dataimport performance Check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox

Re: Dataimport performance

2010-12-16 Thread Glen Newton
[mailto:ephra...@icq.com] Sent: Thursday, December 16, 2010 3:04 AM To: solr-user@lucene.apache.org Subject: RE: Dataimport performance Check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e

Dataimport performance

2010-12-15 Thread Robert Gründler
Hi, we're looking for some comparison-benchmarks for importing large tables from a mysql database (full import). Currently, a full-import of ~ 8 Million rows from a MySQL database takes around 3 hours, on a QuadCore Machine with 16 GB of ram and a Raid 10 storage setup. Solr is running on a

Re: Dataimport performance

2010-12-15 Thread Adam Estrada
What version of Solr are you using? Adam 2010/12/15 Robert Gründler rob...@dubture.com Hi, we're looking for some comparison-benchmarks for importing large tables from a mysql database (full import). Currently, a full-import of ~ 8 Million rows from a MySQL database takes around 3 hours,

Re: Dataimport performance

2010-12-15 Thread Erick Erickson
You're adding on the order of 750 rows (docs)/second, which isn't bad... have you profiled the machine as this runs? Even just with top (assuming unix)... because the very first question is always what takes the time, getting the data from MySQL or indexing or I/O?. If you aren't maxing out your

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
What version of Solr are you using? Solr Specification Version: 1.4.1 Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Specification Version: 2.9.3 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 -robert Adam 2010/12/15 Robert Gründler

Re: Dataimport performance

2010-12-15 Thread Bernd Fehling
We are currently running Solr 4.x from trunk. -d64 -Xms10240M -Xmx10240M Total Rows Fetched: 24935988 Total Documents Skipped: 0 Total Documents Processed: 24568997 Time Taken: 5:55:19.104 24.5 Million Docs as XML from filesystem with less than 6 hours. May be your MySQL is the bottleneck?

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
2010/12/15 Robert Gründler rob...@dubture.com: The data-config.xml looks like this (only 1 entity):      entity name=track query=select t.id as id, t.title as title, l.title as label from track t left join label l on (l.id = t.label_id) where t.deleted = 0 transformer=TemplateTransformer  

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
i've benchmarked the import already with 500k records, one time without the artists subquery, and one time without the join in the main query: Without subquery: 500k in 3 min 30 sec Without join and without subquery: 500k in 2 min 30. With subquery and with left join: 320k in 6 Min 30 so

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
The custom import I wrote is a java application that uses the SolrJ library. Basically, where I had sub-entities in the DIH config I did the mappings inside my java code. 1. Identify a subset or chunk of the primary id's to work on (so I don't have to load everything into memory at once) and put

Re: Dataimport performance

2010-12-15 Thread Lance Norskog
Can you do just one join in the top-level query? The DIH does not have a batching mechanism for these joins, but your database does. On Wed, Dec 15, 2010 at 7:11 AM, Tim Heckman theck...@gmail.com wrote: The custom import I wrote is a java application that uses the SolrJ library. Basically,