On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote:

Whether embedded solr should give me a performance boost or not, it did.
:)  I'm not surprised, since it skips XML parsing.  Although you never
know where cycles are used for sure until you profile.

It certainly is possible that XML parsing dwarfs indexing, but I'd expect that only to occur under very light analysis and field storage workloads.

I tried doing more records per post (200) and it was actually slightly
slower and seemed to require more memory. This makes sense because you
have to take up more memory for the StringBuilder to store the much
larger XML. For 10,000 it was much slower. For that size I would need
to XML streaming or something to make it work.

The solr war was on the same machine, so network overhead was only from
using loopback.

The big question is still your connection handling strategy: are you using persistent http connections? Are you threadedly indexing?

cheers,
-Mike

Paul Sundling

-----Original Message-----
From: climbingrose [mailto:[EMAIL PROTECTED]
Sent: Monday, August 27, 2007 12:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Embedded about 50% faster for indexing


Haven't tried the embedded server but I think I have to agree with Mike.
We're currently sending 2000 job batches to SOLR server and the amount
of time required to transfer documents over http is insignificant
compared with the time required to index them. So I do think unless you
are sending document one by one, embedded SOLR shouldn't give you much
more performance boost.

On 8/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote:

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Yonik Seeley
Sent: Friday, August 24, 2007 2:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Embedded about 50% faster for indexing

One thing I'd like to avoid is everyone trying to embed just for
performance gains. If there is really that much difference, then we

need a better way for people to get that without resorting to Java
code.

-Yonik


Theoretically and practically, embedded solution will be faster than

going through http/xml.

This is only true if the http interface adds significant overhead to
the cost of indexing a document, and I don't see why this should be
so, as indexing is relatively heavyweight.  setting up the connection
could be expensive, but this can be greatly mitigated by sending more
than one doc per http request, using persistent connections, and
threading.

-Mike




--
Regards,

Cuong Hoang

Reply via email to