Yes, you could simply round the date, no need for a non-date type field.
Yes, you can add a field after the fact by making use of ParallelReader and 
merging (I don't recall the details, search the ML for ParallelReader and 
Andrzej), I remember he once provided the working recipe.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Marcus Herou <marcus.he...@tailsweep.com>
> To: solr-user@lucene.apache.org
> Sent: Saturday, April 25, 2009 6:54:02 AM
> Subject: Date faceting - howto improve performance
> 
> Hi.
> 
> One of our faceting use-cases:
> We are creating trend graphs of how many blog posts that contains a certain
> term and groups it by day/week/year etc. with the nice DateMathParser
> functions.
> 
> The performance degrades really fast and consumes a lot of memory which
> forces OOM from time to time
> We think it is due the fact that the cardinality of the field publishedDate
> in our index is huge, almost equal to the nr of documents in the index.
> 
> We need to address that...
> 
> Some questions:
> 
> 1. Can a datefield have other date-formats than the default of yyyy-MM-dd
> HH:mm:ssZ ?
> 
> 2. We are thinking of adding a field to the index which have the format
> yyyy-MM-dd to reduce the cardinality, if that field can't be a date, it
> could perhaps be a string, but the question then is if faceting can be used
> ?
> 
> 3. Since we now already have such a huge index, is there a way to add a
> field afterwards and apply it to all documents without actually reindexing
> the whole shebang ?
> 
> 4. If the field cannot be a string can we just leave out the
> hour/minute/second information and to reduce the cardinality and improve
> performance ? Example: 2009-01-01 00:00:00Z
> 
> 5. I am afraid that we need to reindex everything to get this to work
> (negates Q3). We have 8 shards as of current, what would the most efficient
> way be to reindexing the whole shebang ? Dump the entire database to disk
> (sigh), create many xml file splits and use curl in a
> random/hash(numServers) manner on them ?
> 
> 
> Kindly
> 
> //Marcus
> 
> 
> 
> 
> 
> 
> 
> -- 
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.he...@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/

Reply via email to