You mean doc A and doc B will become one doc after adding index 2 to index 1? I don't think this is currently supported either at Lucene level or at Solr level. If index 1 has m docs and index 2 has n docs, index 1 will have m+n docs after adding index 2 to index 1. Documents themselves are not modified by index merge.
Cheers, Ning On Sat, Apr 25, 2009 at 4:03 PM, Marcus Herou <marcus.he...@tailsweep.com> wrote: > Hmm looking in the code for the IndexMerger in Solr > (org.apache.solr.update.DirectUpdateHandler(2) > > See that the IndexWriter.addIndexesNoOptimize(dirs) is used (union of > indexes) ? > > And the test class org.apache.solr.client.solrj.MergeIndexesExampleTestBase > suggests: > add doc A to index1 with id=AAA,name=core1 > add doc B to index2 with id=BBB,name=core2 > merge the two indexes into one index which then contains both docs. > The resulting index will have 2 docs. > > Great but in my case I think it should work more like this. > > add doc A to index1 with id=X,title=blog entry title,description=blog entry > description > add doc B to index2 with id=X,score=1.2 > somehow add index2 to index1 so id=XX has score=1.2 when searching in index1 > The resulting index should have 1 doc. > > So this is not really what I want right ? > > Sorry for being a smart-ass... > > Kindly > > //Marcus > > > > > > On Sat, Apr 25, 2009 at 5:10 PM, Marcus Herou > <marcus.he...@tailsweep.com>wrote: > >> Guys! >> >> Thanks for these insights, I think we will head for Lucene level merging >> strategy (two or more indexes). >> When merging I guess the second index need to have the same doc ids >> somehow. This is an internal id in Lucene, not that easy to get hold of >> right ? >> >> So you are saying the the solr: ExternalFileField + FunctionQuery stuff >> would not work very well performance wise or what do you mean ? >> >> I sure like bleeding edge :) >> >> Cheers dudes >> >> //Marcus >> >> >> >> >> >> On Sat, Apr 25, 2009 at 3:46 PM, Otis Gospodnetic < >> otis_gospodne...@yahoo.com> wrote: >> >>> >>> I should emphasize that the PR trick I mentioned is something you'd do at >>> the Lucene level, outside Solr, and then you'd just slip the modified index >>> back into Solr. >>> Of, if you like the bleeding edge, perhaps you can make use of Ning Li's >>> Solr index merging functionality (patch in JIRA). >>> >>> >>> Otis -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> ----- Original Message ---- >>> > From: Otis Gospodnetic <otis_gospodne...@yahoo.com> >>> > To: solr-user@lucene.apache.org >>> > Sent: Saturday, April 25, 2009 9:41:45 AM >>> > Subject: Re: Date faceting - howto improve performance >>> > >>> > >>> > Yes, you could simply round the date, no need for a non-date type field. >>> > Yes, you can add a field after the fact by making use of ParallelReader >>> and >>> > merging (I don't recall the details, search the ML for ParallelReader >>> and >>> > Andrzej), I remember he once provided the working recipe. >>> > >>> > >>> > Otis -- >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> > >>> > >>> > >>> > ----- Original Message ---- >>> > > From: Marcus Herou >>> > > To: solr-user@lucene.apache.org >>> > > Sent: Saturday, April 25, 2009 6:54:02 AM >>> > > Subject: Date faceting - howto improve performance >>> > > >>> > > Hi. >>> > > >>> > > One of our faceting use-cases: >>> > > We are creating trend graphs of how many blog posts that contains a >>> certain >>> > > term and groups it by day/week/year etc. with the nice DateMathParser >>> > > functions. >>> > > >>> > > The performance degrades really fast and consumes a lot of memory >>> which >>> > > forces OOM from time to time >>> > > We think it is due the fact that the cardinality of the field >>> publishedDate >>> > > in our index is huge, almost equal to the nr of documents in the >>> index. >>> > > >>> > > We need to address that... >>> > > >>> > > Some questions: >>> > > >>> > > 1. Can a datefield have other date-formats than the default of >>> yyyy-MM-dd >>> > > HH:mm:ssZ ? >>> > > >>> > > 2. We are thinking of adding a field to the index which have the >>> format >>> > > yyyy-MM-dd to reduce the cardinality, if that field can't be a date, >>> it >>> > > could perhaps be a string, but the question then is if faceting can be >>> used >>> > > ? >>> > > >>> > > 3. Since we now already have such a huge index, is there a way to add >>> a >>> > > field afterwards and apply it to all documents without actually >>> reindexing >>> > > the whole shebang ? >>> > > >>> > > 4. If the field cannot be a string can we just leave out the >>> > > hour/minute/second information and to reduce the cardinality and >>> improve >>> > > performance ? Example: 2009-01-01 00:00:00Z >>> > > >>> > > 5. I am afraid that we need to reindex everything to get this to work >>> > > (negates Q3). We have 8 shards as of current, what would the most >>> efficient >>> > > way be to reindexing the whole shebang ? Dump the entire database to >>> disk >>> > > (sigh), create many xml file splits and use curl in a >>> > > random/hash(numServers) manner on them ? >>> > > >>> > > >>> > > Kindly >>> > > >>> > > //Marcus >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > -- >>> > > Marcus Herou CTO and co-founder Tailsweep AB >>> > > +46702561312 >>> > > marcus.he...@tailsweep.com >>> > > http://www.tailsweep.com/ >>> > > http://blogg.tailsweep.com/ >>> >>> >> >> >> -- >> Marcus Herou CTO and co-founder Tailsweep AB >> +46702561312 >> marcus.he...@tailsweep.com >> http://www.tailsweep.com/ >> http://blogg.tailsweep.com/ >> > > > > -- > Marcus Herou CTO and co-founder Tailsweep AB > +46702561312 > marcus.he...@tailsweep.com > http://www.tailsweep.com/ > http://blogg.tailsweep.com/ >