The DataImportHandler has tools for this. It will fetch rows from Oracle and allow you to unpack columns as XML with Xpaths.
http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor On Tue, Mar 16, 2010 at 2:25 PM, Neil Chaudhuri <nchaudh...@potomacfusion.com> wrote: > That is a great article, David. > > For the moment, I am trying an all-Solr approach, but I have run into a small > problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object. > Is there any facility to unpack this into the actual text? Or must I execute > that in the SQL query? > > Thanks. > > > -----Original Message----- > From: Smiley, David W. [mailto:dsmi...@mitre.org] > Sent: Tuesday, March 16, 2010 4:45 PM > To: solr-user@lucene.apache.org > Subject: Re: Moving From Oracle Text Search To Solr > > If you do stay with Oracle, please report back to the list how that went. In > order to get decent filtering and faceting performance, I believe you will > need to use "bitmapped indexes" which Oracle and some other databases support. > > You may want to check out my article on this subject: > http://www.packtpub.com/article/text-search-your-database-or-solr > > ~ David Smiley > Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ > > > On Mar 16, 2010, at 4:13 PM, Neil Chaudhuri wrote: > >> Certainly I could use some basic SQL count(*) queries to achieve faceted >> results, but I am not sure of the flexibility, extensibility, or scalability >> of that approach. And from what I have read, Oracle Text doesn't do faceting >> out of the box. >> >> Each document is a few MB, and there will be millions of them. I suppose it >> depends on how I index them. I am pretty sure my current approach of using >> Hibernate to load all rows, constructing Solr POJO's from them, and then >> passing the POJO's to the embedded server would lead to a OOM error. I >> should probably look into the other options. >> >> Thanks. >> >> >> -----Original Message----- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: Tuesday, March 16, 2010 3:58 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Moving From Oracle Text Search To Solr >> >> Why do you think you'd hit OOM errors? How big is "very large"? I've >> indexed, as a single document, a 26 volume encyclopedia of civil war >> records...... >> >> Although as much as I like the technology, if I could get away without using >> two technologies, I would. Are you completely sure you can't get what you >> want with clever Oracle querying? >> >> Best >> Erick >> >> On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri < >> nchaudh...@potomacfusion.com> wrote: >> >>> I am working on an application that currently hits a database containing >>> millions of very large documents. I use Oracle Text Search at the moment, >>> and things work fine. However, there is a request for faceting capability, >>> and Solr seems like a technology I should look at. Suffice to say I am new >>> to Solr, but at the moment I see two approaches-each with drawbacks: >>> >>> >>> 1) Have Solr index document metadata (id, subject, date). Then Use >>> Oracle Text to do a content search based on criteria. Finally, query the >>> Solr index for all documents whose id's match the set of id's returned by >>> Oracle Text. That strikes me as an unmanageable Boolean query. (e.g. >>> id:4ORid:33432323OR...). >>> >>> 2) Remove Oracle Text from the equation and use Solr to query document >>> content based on search criteria. The indexing process though will almost >>> certainly encounter an OutOfMemoryError given the number and size of >>> documents. >>> >>> >>> >>> I am using the embedded server and Solr Java APIs to do the indexing and >>> querying. >>> >>> >>> >>> I would welcome your thoughts on the best way to approach this situation. >>> Please let me know if I should provide additional information. >>> >>> >>> >>> Thanks. >>> > > > > > -- Lance Norskog goks...@gmail.com