Hi

I have looked and cannot see any clear answers to this on
the Interwebs.


I have an index with, say, 10 fields.

I load that index directly from Oracle - data-config.xml using
JDBC.  I can load 10 million rows very quickly.  This direct
way of loading from Oracle straight into SOLR is fantastic -
really efficient and saves writing loads of import/export code
(e.g. via a CSV file).

Of those 10 fields - two of them (set to multiValued) come from
a separate table and there are anything from 1 to 10 rows per
row from the main table.

I can use a nested entity to extract the child rows for each of
the 10m rows in the main table - but then SOLR generates 10m
separate SQL calls - and the load time goes from a few minutes
to several days.

On smaller tables - just a few thousand rows - I can use a
second nested entity with a JDBC call - but not for very large
tables.

Could I load the data in two steps:
1)  load the main 10m rows
2)  load into the existing index by adding the data from a
    second SQL call into fields for each existing row (i.e.
    an UPDATE instead of an INSERT).

I don't know what syntax/option might achieve that.  There
is incremental loading - but I think that replaces whole rows
rather then updating individual fields.  Or maybe it does
do both?

Any other techniques that would be fast/efficient?

Help!

--
Cheers
Jules.

Reply via email to