Jython is a Python interpreter implemented in Java. (I have a lot of Python code.)
Total throughput in the servlet is very sensitive to the total number of servlet sockets available v.s. the number of CPUs. The different analysers have very different performance. You might leave some data in the DB, instead of storing it all in the index. Underlying this all, you have a sneaky network performance problem. Your successive posts do not reuse a TCP socket. Obvious: re-opening a new socket each post takes time. Not obvious: your server has sockets building up in TIME_WAIT state. (This means the sockets are shutting down. Having both ends agree to close the connection is metaphysically difficult. The TCP/IP spec even has a bug in this area.) Sockets building up can use TCP resources to run low or may run out. Your kernel configuration may be weak in this area. Lance -----Original Message----- From: Kevin Holmes [mailto:[EMAIL PROTECTED] Sent: Thursday, August 09, 2007 8:13 AM To: solr-user@lucene.apache.org Subject: Any clever ideas to inject into solr? Without http? I inherited an existing (working) solr indexing script that runs like this: Python script queries the mysql DB then calls bash script Bash script performs a curl POST submit to solr We're injecting about 1000 records / minute (constantly), frequently pushing the edge of our CPU / RAM limitations. I'm in the process of building a Perl script to use DBI and lwp::simple::post that will perform this all from a single script (instead of 3). Two specific questions 1: Does anyone have a clever (or better) way to perform this process efficiently? 2: Is there a way to inject into solr without using POST / curl / http? Admittedly, I'm no solr expert - I'm starting from someone else's setup, trying to reverse-engineer my way out. Any input would be greatly appreciated.