Hi Michael,

We are storing all our data in addition to index, as we need to display those 
values to the user. So unfortunately we cannot go with the option stored=false, 
which could have potentially solved our issue.

Appreciate any other pointers/suggestions

Thanks,
sS

--- On Fri, 9/25/09, Michael <solrco...@gmail.com> wrote:

> From: Michael <solrco...@gmail.com>
> Subject: Re: Can we point a Solr server to index directory dynamically at  
> runtime..
> To: solr-user@lucene.apache.org
> Date: Friday, September 25, 2009, 2:00 PM
> Are you storing (in addition to
> indexing) your data?  Perhaps you could turn
> off storage on data older than 7 days (requires
> reindexing), thus losing the
> ability to return snippets but cutting down on your storage
> space and server
> count.  I've experienced 10x decrease in space
> requirements and a large
> boost in speed after cutting extraneous storage from Solr
> -- the stored data
> is mixed in with the index data and so it slows down
> searches.
> You could also put all 200G onto one Solr instance rather
> than 10 for >7days
> data, and accept that those searches will be slower.
> 
> Michael
> 
> On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer 
> <silentsurfe...@yahoo.com>wrote:
> 
> > Hi,
> >
> > Thank you Michael and Chris for the response.
> >
> > Today after the mail from Michael, we tested with the
> dynamic loading of
> > cores and it worked well. So we need to go with the
> hybrid approach of
> > Multicore and Distributed searching.
> >
> > As per our testing, we found that a Solr instance with
> 20 GB of
> > index(single index or spread across multiple cores)
> can provide better
> > performance when compared to having a Solr instance
> say 40 (or) 50 GB of
> > index (single index or index spread across cores).
> >
> > So the 200 GB of index on day 1 will be spread across
> 200/20=10 Solr salve
> > instances.
> >
> > On day 2 data, 10 more Solr slave servers are
> required; Cumulative Solr
> > Slave instances = 200*2/20=20
> > ...
> > ..
> > ..
> > On day 30 data, 10 more Solr slave servers are
> required; Cumulative Solr
> > Slave instances = 200*30/20=300
> >
> > So with the above approach, we may need ~300 Solr
> slave instances, which
> > becomes very unmanageable.
> >
> > But we know that most of the queries is for the past 1
> week, i.e we
> > definitely need 70 Solr Slaves containing the last 7
> days worth of data up
> > and running.
> >
> > Now for the rest of the 230 Solr instances, do we need
> to keep it running
> > for the odd query,that can span across the 30 days of
> data (30*200 GB=6 TB
> > data) which can come up only a couple of times a day.
> > This linear increase of Solr servers with the
> retention period doesn't
> > seems to be a very scalable solution.
> >
> > So we are looking for something more simpler approach
> to handle this
> > scenario.
> >
> > Appreciate any further inputs/suggestions.
> >
> > Regards,
> > sS
> >
> > --- On Fri, 9/25/09, Chris Hostetter <hossman_luc...@fucit.org>
> wrote:
> >
> > > From: Chris Hostetter <hossman_luc...@fucit.org>
> > > Subject: Re: Can we point a Solr server to index
> directory dynamically
> > at  runtime..
> > > To: solr-user@lucene.apache.org
> > > Date: Friday, September 25, 2009, 4:04 AM
> > > : Using a multicore approach, you
> > > could send a "create a core named
> > > : 'core3weeksold' pointing to
> '/datadirs/3weeksold' "
> > > command to a live Solr,
> > > : which would spin it up on the fly.  Then
> you query
> > > it, and maybe keep it
> > > : spun up until it's not queried for 60 seconds
> or
> > > something, then send a
> > > : "remove core 'core3weeksold' " command.
> > > : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> > > .
> > >
> > > something that seems implicit in the question is
> what to do
> > > when the
> > > request spans all of the data ... this is where
> (in theory)
> > > distributed
> > > searching could help you out.
> > >
> > > index each days worth of data into it's own core,
> that
> > > makes it really
> > > easy to expire the old data (just UNLOAD and
> delete an
> > > entire core once
> > > it's more then 30 days old) if your user is only
> searching
> > > "current" dta
> > > then your app can directly query the core
> containing the
> > > most current data
> > > -- but if they want to query the last week, or
> last two
> > > weeks worth of
> > > data, you do a distributed request for all of the
> shards
> > > needed to search
> > > the appropriate amount of data.
> > >
> > > Between the ALIAS and SWAP commands it on the
> CoreAdmin
> > > screen it should
> > > be pretty easy have cores with names like
> > > "today","1dayold","2dayold" so
> > > that your app can configure simple shard params
> for all the
> > > perumations
> > > you'll need to query.
> > >
> > >
> > > -Hoss
> > >
> > >
> >
> >
> >
> >
> >
> >
> 




Reply via email to