Re: import efficiencies

2016-05-26 Thread John Blythe
all good ideas and recs, guys. erick, i'd thought of much the same after reading through the SolrJ post and beginning to get a bit anxious at the idea of implementation (not a java dev here lol). we're already doing some processing before the import, taking a few million records, rolling them up /

Re: import efficiencies

2016-05-26 Thread John Bickerstaff
Having more carefully read Erick's post - I see that is essentially what he said in a much more straightforward way. I will also second Erick's suggestion of hammering on the SQL. We found that fruitful many times at the same gig. I develop and am not a SQL master. In a similar situation I'll

Re: import efficiencies

2016-05-26 Thread John Bickerstaff
It may or may not be helpful, but there's a similar class of problem that is frequently solved either by stored procedures or by running the query on a time-frame and storing the results... Doesn't matter if the end-point for the data is Solr or somewhere else. The problem is long running

Re: import efficiencies

2016-05-26 Thread Erick Erickson
Forgot to add... sometimes really hammering at the SQL query in DIH can be fruitful, can you make a huge, monster query that's faster than the sub-queries? I've also seen people run processes on the DB that move all the data into a temporary place making use of all of the nifty stuff you can do

Re: import efficiencies

2016-05-26 Thread John Blythe
oo gotcha. cool, will make sure to check it out and bounce any related questions through here. thanks! best, -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Thu, May 26, 2016 at 1:45 PM, Erick

Re: import efficiencies

2016-05-26 Thread Erick Erickson
Solr commits aren't the issue I'd guess. All the time is probably being spent getting the data from MySQL. I've had some luck writing to Solr from a DB through a SolrJ program, here's a place to get started: searchhub.org/2012/02/14/indexing-with-solrj/ you can peel out the Tika bits pretty

import efficiencies

2016-05-26 Thread John Blythe
hi all, i've got layered entities in my solr import. it's calling on some transactional data from a MySQL instance. there are two fields that are used to then lookup other information from other tables via their related UIDs, one of which has its own child entity w yet another select statement to