On 3/5/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > So, I was thinking I could write a driver program that takes in my files and > then calls the API directly. Is this doable?
It's doable... While it will be more efficient, it's not clear how much you will gain, esp if you run with multiple CPUs (IndexWriting is highly synchronized). Check out the UpdateHandler abstract class: public abstract int addDoc(AddUpdateCommand cmd) throws IOException; public abstract void delete(DeleteUpdateCommand cmd) throws IOException; public abstract void deleteByQuery(DeleteUpdateCommand cmd) throws IOException; public abstract void commit(CommitUpdateCommand cmd) throws IOException; public abstract void close() throws IOException; While the implementation of the UpdateHandler is pluggable, there isn't a place to plug in different client handlers (like there is with RequestHandler). You could create another servlet in the same webapp and get the current UpdateHandler (SolrCore.updateHandler) and use that to update the index. Seems like there isn't a getter for SolrCore.updateHandler... feel free to sumbit a patch if you want to go this route. You could even drop down to a lower level and use DocumentBuilder to create your own Lucene Document instances and write them with an IndexWriter yourself. -Yonik > Do you do it all through HTTP requests or through a driver that calls the > API? > I think I would prefer the API calls for bulk loading. Where should I look > for these? > > -Grant > > Yonik Seeley <[EMAIL PROTECTED]> wrote: On 3/5/06, Grant Ingersoll wrote: > > What/where is the Index Builder that is referred to in > > http://wiki.apache.org/solr/CollectionBuilding? > > It's currently client-supplied (i.e. there isn't one). > > Having all Solr users have to write their own builders (code that gets > data from a source and posts XML documents) certainly isn't optimal. > > It would be nice if we could give Solr a database URL with some SQL, > and have it automatically slurp and index the records. It would also > be nice to be able to grab documents from a CSV or other simple > structured text file and index them. > > These ideas are on already on the task list on the (currently down) Wiki. > > -Yonik