Yep, good ol’ ETL. My solution was dumping the data as one JSON object per document, an optional transform step, then a multi-threaded Python loader that was schema-independent. The multi-threaded loader ran way faster than DIH.
This approach also easily supports re-indexing and re-do after a failure. Back with Solr 1.3, before DIH, I wrote a Java program to fetch from the database, then load. That did some transformation, mostly making queue adds comparable with views (this was at Netflix). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 29, 2025, at 10:05 AM, Dmitri Maziuk <dmitri.maz...@gmail.com> wrote: > > On 5/29/25 11:43, Sarah Weissman wrote: > >> I’ve been banging my head against this all week and I’m trying to figure out >> the best way forward at this point. Is DIH still a viable option or should I >> be moving off of that something else? Any advice or perspectives on this >> would be appreciated. > > All you need is a DB reader script, a Solr POSTer script, and a filter in > between where you can do all your transforms. It can easily take less time > and effort to write than figuring out how to get the DIH working "in the > cloud". > > Dima >