Re: Implementing lookups while importing data

Gora Mohanty Thu, 29 Jul 2010 06:31:22 -0700

On Thu, 29 Jul 2010 12:30:50 +0200
Chantal Ackermann <chantal.ackerm...@btelligent.de> wrote:


> Hi Gora,
> 
> your suggestion is good.
> 
> Two thoughts:
> 1. if both of the tables you are joining are in the same database
> under the same user you might want to check why the join is so
> slow. Maybe you just need to add an index on a column that is
> used in your WHERE clauses. Joins should not be slow.

Hmm, that is a very good point. You can probably tell that I am
a novice at databases :-) Currently, I am probably doing the joins
in a way that is naive, and it slows things down by about an order
of magnitude.

> 2. if the tables are in different databases and you are joining
> them via DIH I tend to agree that this can get too slow (I think
> the connections might not get pooled and the jdbc driver adds too
> much overhead - ATTENTION ASSUMPTION).

They are in the same database.

> If it's not a possibility for you to create a temporary table that
> aggregates the required data before indexing, then your proposal
> is indeed a good solution.

Unfortunately, it is not easily doable for us to recreate the
database. Forgot to mention that.

> Another way I can think off right now, that would only reduce your
> coding effort and change it to a configuration task:
> In your indexing procedure do:
> a) create a temporary solr core on your solr server (see the page
> on core admin in the wiki)
> b) index this tmp core with the text data
> c) index your main core with the data by joining it to the already
> existing solr index in the tmp core (this is fast, I can assure
> you, use URLDataSource with XPathEntityProcessor if you are on
> 1.4) d) delete the tmp core (well, or keep it for next time)
[...]

Another great idea, and one which should be less work than a custom
datasource, plus a custom transformer. Thank you very much.

Regards,
Gora

Re: Implementing lookups while importing data

Reply via email to