Besides the other notes here, I agree you'll hit OOM if you try to
read all the rows into memory at once, but I'm absolutely sure you
can read then N at a time instead. Not that I could tell you how, mind
you.....

You're on your way...
Erick

On Tue, Mar 16, 2010 at 4:13 PM, Neil Chaudhuri <
nchaudh...@potomacfusion.com> wrote:

> Certainly I could use some basic SQL count(*) queries to achieve faceted
> results, but I am not sure of the flexibility, extensibility, or scalability
> of that approach. And from what I have read, Oracle Text doesn't do faceting
> out of the box.
>
> Each document is a few MB, and there will be millions of them. I suppose it
> depends on how I index them. I am pretty sure my current approach of using
> Hibernate to load all rows, constructing Solr POJO's from them, and then
> passing the POJO's to the embedded server would lead to a OOM error. I
> should probably look into the other options.
>
> Thanks.
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, March 16, 2010 3:58 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Moving From Oracle Text Search To Solr
>
> Why do you think you'd hit OOM errors? How big is "very large"? I've
> indexed, as a single document, a 26 volume encyclopedia of civil war
> records......
>
> Although as much as I like the technology, if I could get away without
> using
> two technologies, I would. Are you completely sure you can't get what you
> want with clever Oracle querying?
>
> Best
> Erick
>
> On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri <
> nchaudh...@potomacfusion.com> wrote:
>
> > I am working on an application that currently hits a database containing
> > millions of very large documents. I use Oracle Text Search at the moment,
> > and things work fine. However, there is a request for faceting
> capability,
> > and Solr seems like a technology I should look at. Suffice to say I am
> new
> > to Solr, but at the moment I see two approaches-each with drawbacks:
> >
> >
> > 1)      Have Solr index document metadata (id, subject, date). Then Use
> > Oracle Text to do a content search based on criteria. Finally, query the
> > Solr index for all documents whose id's match the set of id's returned by
> > Oracle Text. That strikes me as an unmanageable Boolean query.  (e.g.
> > id:4ORid:33432323OR...).
> >
> > 2)      Remove Oracle Text from the equation and use Solr to query
> document
> > content based on search criteria. The indexing process though will almost
> > certainly encounter an OutOfMemoryError given the number and size of
> > documents.
> >
> >
> >
> > I am using the embedded server and Solr Java APIs to do the indexing and
> > querying.
> >
> >
> >
> > I would welcome your thoughts on the best way to approach this situation.
> > Please let me know if I should provide additional information.
> >
> >
> >
> > Thanks.
> >
>

Reply via email to