The embedded approach is at http://wiki.apache.org/solr/EmbeddedSolr
For my testing I have a tunable setting for records to submit and did 10 per batch. Both approaches committed after every 1000 records, also tunable. A custom Lucene implementation I helped implement was even faster than embedded, using a ramdrive as a double buffer. However that did require a much larger memory footprint. The embedded class have little to no documentation and almost look like stub implementations, but they work well. While this project will succeed in a large part to how easy it is to integrate with non Java clients, I would actually like to see this project more java friendly, like a reference indexing implementation. There are a lot of tools that could be more widely useful like SimplePostTool. With a few API changes it could be used for the demo as well as a useful library. Instead I extended and then had to abandon that and resort to cut and paste reuse in the end. The functionality was 95% there, but just needed API tweaks to make it usable. It also seems unusual exposing fields directly instead of using accessors in the Java code. Accessors can be give a lot of flexibility that field access doesn't have. It would also be nice to able to get java objects back besides XML and JSON, like an Embedded equivalent for search. That way you could integrate more easily with Spring MVC, etc. There may also be some performance gains there. Paul Sundling -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, August 24, 2007 2:07 PM To: solr-user@lucene.apache.org Subject: Re: Embedded about 50% faster for indexing On 8/24/07, Sundling, Paul <[EMAIL PROTECTED]> wrote: > Created two indexer implementations to test HTTP Post versus Embedded > and the performance was 54.6% faster on embedded. > > Thought others might find that interesting that are using Java. Paul, were the documents posted one-per-message, or did you try multiple (like 50 to 100) per message? If one per message, the best way to increase performance is to have multiple threads adding docs. I'd be curious to know how a single CSV file would clock in at as well... One thing I'd like to avoid is everyone trying to embed just for performance gains. If there is really that much difference, then we need a better way for people to get that without resorting to Java code. -Yonik