Hi Gora, your suggestion is good.
Two thoughts: 1. if both of the tables you are joining are in the same database under the same user you might want to check why the join is so slow. Maybe you just need to add an index on a column that is used in your WHERE clauses. Joins should not be slow. 2. if the tables are in different databases and you are joining them via DIH I tend to agree that this can get too slow (I think the connections might not get pooled and the jdbc driver adds too much overhead - ATTENTION ASSUMPTION). If it's not a possibility for you to create a temporary table that aggregates the required data before indexing, then your proposal is indeed a good solution. Another way I can think off right now, that would only reduce your coding effort and change it to a configuration task: In your indexing procedure do: a) create a temporary solr core on your solr server (see the page on core admin in the wiki) b) index this tmp core with the text data c) index your main core with the data by joining it to the already existing solr index in the tmp core (this is fast, I can assure you, use URLDataSource with XPathEntityProcessor if you are on 1.4) d) delete the tmp core (well, or keep it for next time) Chantal On Thu, 2010-07-29 at 11:51 +0200, Gora Mohanty wrote: > Hi, > > We have a database that has numeric values for some columns, which > correspond to text values in drop-downs on a website. We need to > index both the numeric and text equivalents into Solr, and can do > that via a lookup on a different table from the one holding the > main data. We are currently doing this via a JOIN on the numeric > field, between the main data table and the lookup table, but this > dramatically slows down indexing. > > We could try using the CachedSqlEntity processor, but there are > some issues in doing that, as the data import handler is quite > complicated. > > As the lookups need to be done only once, I was planning the > following: > (a) Do the lookups in a custom data source that extends > JDBCDataSource, and store them in arrays. > (b) Implement a custom transformer that uses the array data > to convert numeric values read from the database to text. > Comments on this approach, or suggestions for simpler ones would be > much appreciated. > > Regards, > Gora