I'm really interested in reading the answer to this thread as my problem is rather the same. Maybe my main difference is the huge SKU number per product I may have.
David On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor <wired...@hotmail.com> wrote: > > Hoss, > > Would you suggest using dedup for my use case; and if so, do you know of a > working example I can reference? > > I don't have an issue using the patched version of Solr, but I'd much > rather > use the GA version. > > -Kelly > > > > hossman wrote: > > > > > > : Dedupe is completely the wrong word. Deduping is something else > > : entirely - it is about trying not to index the same document twice. > > > > Dedup can also certainly be used with field collapsing -- that was one of > > the initial use cases identified for the SignatureUpdateProcessorFactory > > ... you can compute an 'expensive' signature when adding a document, > index > > it, and then FieldCollapse on that signature field. > > > > This gives you "query time deduplication" based on a value computed when > > indexing (the canonical example is multiple urls refrenceing the "same" > > content but with slightly differnet boilerplate markup. You can use a > > Signature class that recognizes the boilerplate and computes an identical > > signature value for each URL whose content is "the same" but still index > > all of the URLs and their content as distinct documents ... so use cases > > where people only "distinct" URLs work using field collapse but by > default > > all matching documents can still be returned and searches on text in the > > boilerplate markup also still work. > > > > > > -Hoss > > > > > > > > -- > View this message in context: > http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html > Sent from the Solr - User mailing list archive at Nabble.com. > >