RE: Embedded about 50% faster for indexing

Sundling, Paul Fri, 24 Aug 2007 14:49:30 -0700

The embedded approach is at http://wiki.apache.org/solr/EmbeddedSolr

For my testing I have a tunable setting for records to submit and did 10
per batch.  Both approaches committed after every 1000 records, also
tunable.  

A custom Lucene implementation I helped implement was even faster than
embedded, using a ramdrive as a double buffer.  However that did require
a much larger memory footprint.

The embedded class have little to no documentation and almost look like
stub implementations, but they work well.

While this project will succeed in a large part to how easy it is to
integrate with non Java clients, I would actually like to see this
project more java friendly, like a reference indexing implementation.
There are a lot of tools that could be more widely useful like
SimplePostTool.  

With a few API changes it could be used for the demo as well as a useful
library.  Instead I extended and then had to abandon that and resort to
cut and paste reuse in the end.  The functionality was 95% there, but
just needed API tweaks to make it usable.  It also seems unusual
exposing fields directly instead of using accessors in the Java code.
Accessors can be give a lot of flexibility that field access doesn't
have.

It would also be nice to able to get java objects back besides XML and
JSON, like an Embedded equivalent for search.  That way you could
integrate more easily with Spring MVC, etc.  There may also be some
performance gains there.  

Paul Sundling

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Friday, August 24, 2007 2:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Embedded about 50% faster for indexing

On 8/24/07, Sundling, Paul <[EMAIL PROTECTED]> wrote:
> Created two indexer implementations to test HTTP Post versus Embedded 
> and the performance was 54.6% faster on embedded.
>
> Thought others might find that interesting that are using Java.

Paul, were the documents posted one-per-message, or did you try multiple
(like 50 to 100) per message?  If one per message, the best way to
increase performance is to have multiple threads adding docs.

I'd be curious to know how a single CSV file would clock in at as
well...

One thing I'd like to avoid is everyone trying to embed just for
performance gains. If there is really that much difference, then we need
a better way for people to get that without resorting to Java code.

-Yonik

RE: Embedded about 50% faster for indexing

Reply via email to