On Thu, 29 Jul 2010 12:30:50 +0200 Chantal Ackermann <chantal.ackerm...@btelligent.de> wrote:
> Hi Gora, > > your suggestion is good. > > Two thoughts: > 1. if both of the tables you are joining are in the same database > under the same user you might want to check why the join is so > slow. Maybe you just need to add an index on a column that is > used in your WHERE clauses. Joins should not be slow. Hmm, that is a very good point. You can probably tell that I am a novice at databases :-) Currently, I am probably doing the joins in a way that is naive, and it slows things down by about an order of magnitude. > 2. if the tables are in different databases and you are joining > them via DIH I tend to agree that this can get too slow (I think > the connections might not get pooled and the jdbc driver adds too > much overhead - ATTENTION ASSUMPTION). They are in the same database. > If it's not a possibility for you to create a temporary table that > aggregates the required data before indexing, then your proposal > is indeed a good solution. Unfortunately, it is not easily doable for us to recreate the database. Forgot to mention that. > Another way I can think off right now, that would only reduce your > coding effort and change it to a configuration task: > In your indexing procedure do: > a) create a temporary solr core on your solr server (see the page > on core admin in the wiki) > b) index this tmp core with the text data > c) index your main core with the data by joining it to the already > existing solr index in the tmp core (this is fast, I can assure > you, use URLDataSource with XPathEntityProcessor if you are on > 1.4) d) delete the tmp core (well, or keep it for next time) [...] Another great idea, and one which should be less work than a custom datasource, plus a custom transformer. Thank you very much. Regards, Gora