The DataImportHandler has tools for this. It will fetch rows from
Oracle and allow you to unpack columns as XML with  Xpaths.

On Tue, Mar 16, 2010 at 2:25 PM, Neil Chaudhuri
<> wrote:
> That is a great article, David.
> For the moment, I am trying an all-Solr approach, but I have run into a small 
> problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object. 
> Is there any facility to unpack this into the actual text? Or must I execute 
> that in the SQL query?
> Thanks.
> -----Original Message-----
> From: Smiley, David W. []
> Sent: Tuesday, March 16, 2010 4:45 PM
> To:
> Subject: Re: Moving From Oracle Text Search To Solr
> If you do stay with Oracle, please report back to the list how that went.  In 
> order to get decent filtering and faceting performance, I believe you will 
> need to use "bitmapped indexes" which Oracle and some other databases support.
> You may want to check out my article on this subject: 
> ~ David Smiley
> Author:
> On Mar 16, 2010, at 4:13 PM, Neil Chaudhuri wrote:
>> Certainly I could use some basic SQL count(*) queries to achieve faceted 
>> results, but I am not sure of the flexibility, extensibility, or scalability 
>> of that approach. And from what I have read, Oracle Text doesn't do faceting 
>> out of the box.
>> Each document is a few MB, and there will be millions of them. I suppose it 
>> depends on how I index them. I am pretty sure my current approach of using 
>> Hibernate to load all rows, constructing Solr POJO's from them, and then 
>> passing the POJO's to the embedded server would lead to a OOM error. I 
>> should probably look into the other options.
>> Thanks.
>> -----Original Message-----
>> From: Erick Erickson []
>> Sent: Tuesday, March 16, 2010 3:58 PM
>> To:
>> Subject: Re: Moving From Oracle Text Search To Solr
>> Why do you think you'd hit OOM errors? How big is "very large"? I've
>> indexed, as a single document, a 26 volume encyclopedia of civil war
>> records......
>> Although as much as I like the technology, if I could get away without using
>> two technologies, I would. Are you completely sure you can't get what you
>> want with clever Oracle querying?
>> Best
>> Erick
>> On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri <
>>> wrote:
>>> I am working on an application that currently hits a database containing
>>> millions of very large documents. I use Oracle Text Search at the moment,
>>> and things work fine. However, there is a request for faceting capability,
>>> and Solr seems like a technology I should look at. Suffice to say I am new
>>> to Solr, but at the moment I see two approaches-each with drawbacks:
>>> 1)      Have Solr index document metadata (id, subject, date). Then Use
>>> Oracle Text to do a content search based on criteria. Finally, query the
>>> Solr index for all documents whose id's match the set of id's returned by
>>> Oracle Text. That strikes me as an unmanageable Boolean query.  (e.g.
>>> id:4ORid:33432323OR...).
>>> 2)      Remove Oracle Text from the equation and use Solr to query document
>>> content based on search criteria. The indexing process though will almost
>>> certainly encounter an OutOfMemoryError given the number and size of
>>> documents.
>>> I am using the embedded server and Solr Java APIs to do the indexing and
>>> querying.
>>> I would welcome your thoughts on the best way to approach this situation.
>>> Please let me know if I should provide additional information.
>>> Thanks.

Lance Norskog

Reply via email to