Jython is a Python interpreter implemented in Java. (I have a lot of Python
code.)

Total throughput in the servlet is very sensitive to the total number of
servlet sockets available v.s. the number of CPUs.

The different analysers have very different performance.

You might leave some data in the DB, instead of storing it all in the index.

Underlying this all, you have a sneaky network performance problem. Your
successive posts do not reuse a TCP socket. Obvious: re-opening a new socket
each post takes time. Not obvious: your server has sockets building up in
TIME_WAIT state.  (This means the sockets are shutting down. Having both
ends agree to close the connection is metaphysically difficult. The TCP/IP
spec even has a bug in this area.) Sockets building up can use TCP resources
to run low or may run out. Your kernel configuration may be weak in this
area.

Lance

-----Original Message-----
From: Kevin Holmes [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 09, 2007 8:13 AM
To: solr-user@lucene.apache.org
Subject: Any clever ideas to inject into solr? Without http?

I inherited an existing (working) solr indexing script that runs like
this:

 

Python script queries the mysql DB then calls bash script

Bash script performs a curl POST submit to solr

 

We're injecting about 1000 records / minute (constantly), frequently pushing
the edge of our CPU / RAM limitations.

 

I'm in the process of building a Perl script to use DBI and
lwp::simple::post that will perform this all from a single script (instead
of 3).

 

Two specific questions

1: Does anyone have a clever (or better) way to perform this process
efficiently?

 

2: Is there a way to inject into solr without using POST / curl / http?

 

Admittedly, I'm no solr expert - I'm starting from someone else's setup,
trying to reverse-engineer my way out.  Any input would be greatly
appreciated.


Reply via email to