RE: Embedded about 50% faster for indexing

Sundling, Paul Mon, 27 Aug 2007 12:44:54 -0700

Whether embedded solr should give me a performance boost or not, it did.
:)  I'm not surprised, since it skips XML parsing.  Although you never
know where cycles are used for sure until you profile.

I tried doing more records per post (200) and it was actually slightly
slower and seemed to require more memory.  This makes sense because you
have to take up more memory for the StringBuilder to store the much
larger XML.  For 10,000 it was much slower.  For that size I would need
to XML streaming or something to make it work.

The solr war was on the same machine, so network overhead was only from
using loopback.

Paul Sundling

-----Original Message-----
From: climbingrose [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 27, 2007 12:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Embedded about 50% faster for indexing

Haven't tried the embedded server but I think I have to agree with Mike.
We're currently sending 2000 job batches to SOLR server and the amount
of time required to transfer documents over http is insignificant
compared with the time required to index them. So I do think unless you
are sending document one by one, embedded SOLR shouldn't give you much
more performance boost.

On 8/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote:
>
> >> -----Original Message-----
> >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of 
> >> Yonik Seeley
> >> Sent: Friday, August 24, 2007 2:07 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Embedded about 50% faster for indexing
> >>
> >> One thing I'd like to avoid is everyone trying to embed just for 
> >> performance gains. If there is really that much difference, then we

> >> need a better way for people to get that without resorting to Java 
> >> code.
> >>
> >> -Yonik
> >>
> >
> > Theoretically and practically, embedded solution will be faster than

> > going through http/xml.
>
> This is only true if the http interface adds significant overhead to 
> the cost of indexing a document, and I don't see why this should be 
> so, as indexing is relatively heavyweight.  setting up the connection 
> could be expensive, but this can be greatly mitigated by sending more 
> than one doc per http request, using persistent connections, and 
> threading.
>
> -Mike
>

-- 
Regards,

Cuong Hoang

RE: Embedded about 50% faster for indexing

Reply via email to