Re: Date faceting - howto improve performance

Otis Gospodnetic Sat, 25 Apr 2009 06:47:22 -0700

I should emphasize that the PR trick I mentioned is something you'd do at the 
Lucene level, outside Solr, and then you'd just slip the modified index back 
into Solr.
Of, if you like the bleeding edge, perhaps you can make use of Ning Li's Solr 
index merging functionality (patch in JIRA).



Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Otis Gospodnetic <otis_gospodne...@yahoo.com>
> To: solr-user@lucene.apache.org
> Sent: Saturday, April 25, 2009 9:41:45 AM
> Subject: Re: Date faceting - howto improve performance
> 
> 
> Yes, you could simply round the date, no need for a non-date type field.
> Yes, you can add a field after the fact by making use of ParallelReader and 
> merging (I don't recall the details, search the ML for ParallelReader and 
> Andrzej), I remember he once provided the working recipe.
> 
> 
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
> > From: Marcus Herou 
> > To: solr-user@lucene.apache.org
> > Sent: Saturday, April 25, 2009 6:54:02 AM
> > Subject: Date faceting - howto improve performance
> > 
> > Hi.
> > 
> > One of our faceting use-cases:
> > We are creating trend graphs of how many blog posts that contains a certain
> > term and groups it by day/week/year etc. with the nice DateMathParser
> > functions.
> > 
> > The performance degrades really fast and consumes a lot of memory which
> > forces OOM from time to time
> > We think it is due the fact that the cardinality of the field publishedDate
> > in our index is huge, almost equal to the nr of documents in the index.
> > 
> > We need to address that...
> > 
> > Some questions:
> > 
> > 1. Can a datefield have other date-formats than the default of yyyy-MM-dd
> > HH:mm:ssZ ?
> > 
> > 2. We are thinking of adding a field to the index which have the format
> > yyyy-MM-dd to reduce the cardinality, if that field can't be a date, it
> > could perhaps be a string, but the question then is if faceting can be used
> > ?
> > 
> > 3. Since we now already have such a huge index, is there a way to add a
> > field afterwards and apply it to all documents without actually reindexing
> > the whole shebang ?
> > 
> > 4. If the field cannot be a string can we just leave out the
> > hour/minute/second information and to reduce the cardinality and improve
> > performance ? Example: 2009-01-01 00:00:00Z
> > 
> > 5. I am afraid that we need to reindex everything to get this to work
> > (negates Q3). We have 8 shards as of current, what would the most efficient
> > way be to reindexing the whole shebang ? Dump the entire database to disk
> > (sigh), create many xml file splits and use curl in a
> > random/hash(numServers) manner on them ?
> > 
> > 
> > Kindly
> > 
> > //Marcus
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.he...@tailsweep.com
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/

Re: Date faceting - howto improve performance

Reply via email to