The DataImportHandler has tools for this. It will fetch rows from
Oracle and allow you to unpack columns as XML with  Xpaths.

http://wiki.apache.org/solr/DataImportHandler
http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS
http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor

On Tue, Mar 16, 2010 at 2:25 PM, Neil Chaudhuri
<nchaudh...@potomacfusion.com> wrote:
> That is a great article, David.
>
> For the moment, I am trying an all-Solr approach, but I have run into a small 
> problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object. 
> Is there any facility to unpack this into the actual text? Or must I execute 
> that in the SQL query?
>
> Thanks.
>
>
> -----Original Message-----
> From: Smiley, David W. [mailto:dsmi...@mitre.org]
> Sent: Tuesday, March 16, 2010 4:45 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Moving From Oracle Text Search To Solr
>
> If you do stay with Oracle, please report back to the list how that went.  In 
> order to get decent filtering and faceting performance, I believe you will 
> need to use "bitmapped indexes" which Oracle and some other databases support.
>
> You may want to check out my article on this subject: 
> http://www.packtpub.com/article/text-search-your-database-or-solr
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
>
> On Mar 16, 2010, at 4:13 PM, Neil Chaudhuri wrote:
>
>> Certainly I could use some basic SQL count(*) queries to achieve faceted 
>> results, but I am not sure of the flexibility, extensibility, or scalability 
>> of that approach. And from what I have read, Oracle Text doesn't do faceting 
>> out of the box.
>>
>> Each document is a few MB, and there will be millions of them. I suppose it 
>> depends on how I index them. I am pretty sure my current approach of using 
>> Hibernate to load all rows, constructing Solr POJO's from them, and then 
>> passing the POJO's to the embedded server would lead to a OOM error. I 
>> should probably look into the other options.
>>
>> Thanks.
>>
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Tuesday, March 16, 2010 3:58 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Moving From Oracle Text Search To Solr
>>
>> Why do you think you'd hit OOM errors? How big is "very large"? I've
>> indexed, as a single document, a 26 volume encyclopedia of civil war
>> records......
>>
>> Although as much as I like the technology, if I could get away without using
>> two technologies, I would. Are you completely sure you can't get what you
>> want with clever Oracle querying?
>>
>> Best
>> Erick
>>
>> On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri <
>> nchaudh...@potomacfusion.com> wrote:
>>
>>> I am working on an application that currently hits a database containing
>>> millions of very large documents. I use Oracle Text Search at the moment,
>>> and things work fine. However, there is a request for faceting capability,
>>> and Solr seems like a technology I should look at. Suffice to say I am new
>>> to Solr, but at the moment I see two approaches-each with drawbacks:
>>>
>>>
>>> 1)      Have Solr index document metadata (id, subject, date). Then Use
>>> Oracle Text to do a content search based on criteria. Finally, query the
>>> Solr index for all documents whose id's match the set of id's returned by
>>> Oracle Text. That strikes me as an unmanageable Boolean query.  (e.g.
>>> id:4ORid:33432323OR...).
>>>
>>> 2)      Remove Oracle Text from the equation and use Solr to query document
>>> content based on search criteria. The indexing process though will almost
>>> certainly encounter an OutOfMemoryError given the number and size of
>>> documents.
>>>
>>>
>>>
>>> I am using the embedded server and Solr Java APIs to do the indexing and
>>> querying.
>>>
>>>
>>>
>>> I would welcome your thoughts on the best way to approach this situation.
>>> Please let me know if I should provide additional information.
>>>
>>>
>>>
>>> Thanks.
>>>
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to