Companies Using Solr

2008-02-21 Thread Clay Webster
Hey Folks,

Reminder: http://wiki.apache.org/solr/PublicServers lists the sites using
Solr.  The listing is a bit thin.  I know many people don't know about the
list or don't have the time to add themselves to the list.  I'd like to be
able to promote open sourcing more systems (like Solr) and this information
would help show it is helping a large community.

Feel free to reply directly to me and I can add you.

Thanks.

--cw

Clay Webster
Associate VP, Platform Infrastructure
CNET, Inc. (Nasdaq:CNET)


Re: Request for graphics

2007-09-29 Thread Clay Webster
'k.  see SOLR-368.

--cw

On 9/28/07, Yonik Seeley [EMAIL PROTECTED] wrote:

 On 9/28/07, Clay Webster [EMAIL PROTECTED] wrote:
  i'm late for dinner out, so i'm just attaching it here.

 Most attachments are stripped :-)

 -Yonik



Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Clay Webster
Condensing the loader into a single executable sounds right if
you have performance problems. ;-)

You could also try adding multiple docs in a single post if you
notice your problems are with tcp setup time, though if you're
doing localhost connections that should be minimal.

If you're already local to the solr server, you might check out the
CSV slurper. http://wiki.apache.org/solr/UpdateCSV  It's a little
specialized.

And then there's of course the question of are you doing full
re-indexing or incremental indexing of changes?

--cw


On 8/9/07, Kevin Holmes [EMAIL PROTECTED] wrote:

 I inherited an existing (working) solr indexing script that runs like
 this:



 Python script queries the mysql DB then calls bash script

 Bash script performs a curl POST submit to solr



 We're injecting about 1000 records / minute (constantly), frequently
 pushing the edge of our CPU / RAM limitations.



 I'm in the process of building a Perl script to use DBI and
 lwp::simple::post that will perform this all from a single script
 (instead of 3).



 Two specific questions

 1: Does anyone have a clever (or better) way to perform this process
 efficiently?



 2: Is there a way to inject into solr without using POST / curl / http?



 Admittedly, I'm no solr expert - I'm starting from someone else's setup,
 trying to reverse-engineer my way out.  Any input would be greatly
 appreciated.




Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Clay Webster
If it's a contention between search and indexing, separate  them
via a query-slave and an index-master.

--cw

On 8/9/07, David Whalen [EMAIL PROTECTED] wrote:

 What we're looking for is a way to inject *without* using
 curl, or wget, or any other http-based communication.  We'd
 like for the HTTP daemon to only handle search requests, not
 indexing requests on top of them.

 Plus, I have to believe there's a faster way to get documents
 into solr/lucene than using curl

 _
 david whalen
 senior applications developer
 eNR Services, Inc.
 [EMAIL PROTECTED]
 203-849-7240


  -Original Message-
  From: Clay Webster [mailto:[EMAIL PROTECTED]
  Sent: Thursday, August 09, 2007 11:43 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Any clever ideas to inject into solr? Without http?
 
  Condensing the loader into a single executable sounds right
  if you have performance problems. ;-)
 
  You could also try adding multiple docs in a single post if
  you notice your problems are with tcp setup time, though if
  you're doing localhost connections that should be minimal.
 
  If you're already local to the solr server, you might check
  out the CSV slurper. http://wiki.apache.org/solr/UpdateCSV
  It's a little specialized.
 
  And then there's of course the question of are you doing
  full re-indexing or incremental indexing of changes?
 
  --cw
 
 
  On 8/9/07, Kevin Holmes [EMAIL PROTECTED] wrote:
  
   I inherited an existing (working) solr indexing script that
  runs like
   this:
  
  
  
   Python script queries the mysql DB then calls bash script
  
   Bash script performs a curl POST submit to solr
  
  
  
   We're injecting about 1000 records / minute (constantly),
  frequently
   pushing the edge of our CPU / RAM limitations.
  
  
  
   I'm in the process of building a Perl script to use DBI and
   lwp::simple::post that will perform this all from a single script
   (instead of 3).
  
  
  
   Two specific questions
  
   1: Does anyone have a clever (or better) way to perform
  this process
   efficiently?
  
  
  
   2: Is there a way to inject into solr without using POST /
  curl / http?
  
  
  
   Admittedly, I'm no solr expert - I'm starting from someone else's
   setup, trying to reverse-engineer my way out.  Any input would be
   greatly appreciated.
  
  
 



Re: To cluster, or not to cluster...

2006-03-24 Thread Clay Webster
On 3/24/06, Robert Haycock [EMAIL PROTECTED] wrote:

 Is it/will it be possible to cluster solr?

 We have a distributed system and it would be nice if we could replicate
 the index to improve performance.


Solr does not have replication.  But it does have a very nice index
distribution system.

Solr can be run in a master/slave setup.  The master receives all the
changes.  For each commit a snapshooter index can be made.  The slaves can
run the snappuller with whatever polling frequency they like.  Each snapshot
is then snapinstalled in the slave and can have its cache warmed (while
serving queries from the older index).

Slaves can come on line with new indexes out of sync.  But if your slave
hardware is the same and your pulling and shooting well-understood, and you
make warming time-based it probably will not be a problem.  This
distribution is noted by each slave in the master.  That's as tied together
as they get (not much).  So, if you have a requirement that they must all be
in index-version-sync you could tie them closer and extend Solr.

--cw