Hi I have looked and cannot see any clear answers to this on the Interwebs.
I have an index with, say, 10 fields. I load that index directly from Oracle - data-config.xml using JDBC. I can load 10 million rows very quickly. This direct way of loading from Oracle straight into SOLR is fantastic - really efficient and saves writing loads of import/export code (e.g. via a CSV file). Of those 10 fields - two of them (set to multiValued) come from a separate table and there are anything from 1 to 10 rows per row from the main table. I can use a nested entity to extract the child rows for each of the 10m rows in the main table - but then SOLR generates 10m separate SQL calls - and the load time goes from a few minutes to several days. On smaller tables - just a few thousand rows - I can use a second nested entity with a JDBC call - but not for very large tables. Could I load the data in two steps: 1) load the main 10m rows 2) load into the existing index by adding the data from a second SQL call into fields for each existing row (i.e. an UPDATE instead of an INSERT). I don't know what syntax/option might achieve that. There is incremental loading - but I think that replaces whole rows rather then updating individual fields. Or maybe it does do both? Any other techniques that would be fast/efficient? Help! -- Cheers Jules.