Thanks everybody for the information. Shawn, thanks for bringing up the issues around making sure each document is indexed ok. With our current architecture, that is important for us.
Yonik's clarification about streaming really helped me to understand one of the main advantages of CUSS: >>When you add a document, it immediately writes it to a stream where solr can read it off and index it. When you add a second document, it's immediately written to the same stream (or at least one of the open streams), as part of the same udpate request. No separate HTTP request, No separate update request. In our use case, where documents are in the 700K-2MB range, I suspect that the overhead of opening/closing new requests is dwarfed by the time it takes to just send the data over the wire and parsing the data. However, I'm starting to think about whether I can find some time to do some testing. Mikhail, thanks for suggesting looking at DIH, I haven't looked at it in several years and didn't realize there is now functionality to deal with XML documents. When I asked about being able to read XML files from the filesystem, it was for the purposes of running some benchmark tests to see if CUSS offers enough advantages to re-architect our system. Currently the main bottleneck in our system is constructing Solr documents. We use multiple "document producers" which are responsible both for creating a document and for sending it to Solr. Although each producer waits until it gets a response from Solr before sending the next document to be indexed, we run 20-100 producers, so this is similar to CUSS running multiple threads. (although of course we open a new http request and Solr update request each time) As far as using DIH or something like it, we might be able to use it for testing with already created documents. Creating the documents requires assembling (and massaging) data from several sources including a few database queries, unzipping files on our filesystem and contatenating them, and querying another Solr instance which has metadata. I'm now thinking that for testing purposes it might be sufficient to construct dummy documents as in the examples rather than trying to use our actual documents. If the speed improvements look significant enough, then I'd need to figure out how to test with real documents. Thanks again for all the input. Tom