Besides the other notes here, I agree you'll hit OOM if you try to read all the rows into memory at once, but I'm absolutely sure you can read then N at a time instead. Not that I could tell you how, mind you.....
You're on your way... Erick On Tue, Mar 16, 2010 at 4:13 PM, Neil Chaudhuri < nchaudh...@potomacfusion.com> wrote: > Certainly I could use some basic SQL count(*) queries to achieve faceted > results, but I am not sure of the flexibility, extensibility, or scalability > of that approach. And from what I have read, Oracle Text doesn't do faceting > out of the box. > > Each document is a few MB, and there will be millions of them. I suppose it > depends on how I index them. I am pretty sure my current approach of using > Hibernate to load all rows, constructing Solr POJO's from them, and then > passing the POJO's to the embedded server would lead to a OOM error. I > should probably look into the other options. > > Thanks. > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, March 16, 2010 3:58 PM > To: solr-user@lucene.apache.org > Subject: Re: Moving From Oracle Text Search To Solr > > Why do you think you'd hit OOM errors? How big is "very large"? I've > indexed, as a single document, a 26 volume encyclopedia of civil war > records...... > > Although as much as I like the technology, if I could get away without > using > two technologies, I would. Are you completely sure you can't get what you > want with clever Oracle querying? > > Best > Erick > > On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri < > nchaudh...@potomacfusion.com> wrote: > > > I am working on an application that currently hits a database containing > > millions of very large documents. I use Oracle Text Search at the moment, > > and things work fine. However, there is a request for faceting > capability, > > and Solr seems like a technology I should look at. Suffice to say I am > new > > to Solr, but at the moment I see two approaches-each with drawbacks: > > > > > > 1) Have Solr index document metadata (id, subject, date). Then Use > > Oracle Text to do a content search based on criteria. Finally, query the > > Solr index for all documents whose id's match the set of id's returned by > > Oracle Text. That strikes me as an unmanageable Boolean query. (e.g. > > id:4ORid:33432323OR...). > > > > 2) Remove Oracle Text from the equation and use Solr to query > document > > content based on search criteria. The indexing process though will almost > > certainly encounter an OutOfMemoryError given the number and size of > > documents. > > > > > > > > I am using the embedded server and Solr Java APIs to do the indexing and > > querying. > > > > > > > > I would welcome your thoughts on the best way to approach this situation. > > Please let me know if I should provide additional information. > > > > > > > > Thanks. > > >