Hi,

1) Look for "multicore" on Solr Wiki

2) I meant to say you would not index it all in one index (that's what you 
wanted to do, no?).  So in your app you'd do something like
ts = doc.getTimestamp();
indexer = getIndexer(ts); // gives you different indexer based on the ts.  You 
keep track of all the indexers (e.g. all instances of solr client you have in 
your app, each of which points to a different solr server/core/index)
indexer.index(doc);


If your issue is large indices and search performance, then the solution is not 
to have multiple solr cores/indices per machine as much as distributed indexing 
(multiple servers).  Look at DistributedSearch page on the Wiki.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: vivek sar <vivex...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, March 26, 2009 12:58:04 PM
> Subject: Re: Partition index by time using Solr
> 
> Thanks Otis for the response. I'm still not clear on few things,
> 
> 1) I thought Solr can work with only one index at a time. In order to
> have multiple indexes you need multiple instances of Solr - isn't that
> right? How can we make Solr to read/ write from and to multiple
> indexes?
> 
> 2) What does it mean by "partitioning outside of Solr"? If all the
> data is indexed by Solr into one index - how would one parition it
> outside Solr that is still searchable by Solr when needed?
> 
> Our main problem is scaling with Solr. Our indexes grow so big (like
> 10G-20G everyday) that it's hard to optimize them and search on large
> indexes. That's why we are trying to partition them by time. We do
> need to keep up to 6 months of data.
> 
> The only way I can think of limiting the index size is by running
> multiple Solr instances, but even then it's not a scalable solution if
> the indexes keep growing.
> 
> Thanks,
> -vivek
> 
> 
> On Wed, Mar 25, 2009 at 6:59 PM, Otis Gospodnetic
> wrote:
> >
> > Hi,
> >
> > Yes, you can use Solr for this, but index partitioning should be done 
> > outside 
> of Solr.  That is, your app will need to know where to send each doc based on 
> its timestamp, when and where to create new index (new Solr core), and so on. 
>  
> Similarly, deleting older than N days is done by you, using a delete by query 
> with a date-based open-ended range query.  The Solr setup is really done the 
> same as usual, since all the partitioning-related stuff lives outside of 
> Solr.  
> Of course, you could come up with a "Solr Proxy" component that abstract 
> some/all of this and pretends to be Solr.
> >
> >
> > Otis --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: vivek sar 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, March 25, 2009 3:52:11 PM
> >> Subject: Partition index by time using Solr
> >>
> >> Hi,
> >>
> >>   I've used Lucene before, but new to Solr. I've gone through the
> >> mailing list, but unable to find any clear idea on how to partition
> >> Solr indexes. Here is what we want,
> >>
> >>   1) Be able to partition indexes by timestamp - basically partition
> >> per day (create a new index directory every day)
> >>
> >>   2) Be  able to search partitions based on timestamp. All our queries
> >> are time based, so instead of looking into all the partitions I want
> >> to go directly to the partitions where the data might be.
> >>
> >>   3) Be able to purge any data older than 6 months without bringing
> >> down the application. Since, partitions would be marked by timestamp
> >> we would just have to delete the old partitions.
> >>
> >>
> >>   This is going to be a distributed system with 2 boxes each running
> >> an instance of Solr. I don't  want to replicate data, but each box may
> >> have same timestamp partition with different data. We would be
> >> indexing on avg of  20 million documents (each document = 500 bytes)
> >> with estimate of 10g in index size - evenly distributed across
> >> machines
> >>   (each machine would get roughly 5g of index everyday).
> >>
> >>   My questions,
> >>
> >>   1) Is this all possible using Solr? If not, should I just do this
> >> using Lucene or is there any other out-of-box alternative?
> >>   2) If it's possible in Solr how do we do this - configuration, setup etc.
> >>   3) How would I optimize the partitions - would it be required when using 
> Solr?
> >>
> >>   Thanks,
> >>   -vivek
> >
> >

Reply via email to