Two Peters (or rather a stupid english bloke who can't work out how to type fancy accents :-)
Sorry Péter (took me 10 minutes to work out i could cut and paste) my reply was to the clustering post by Peter Sturge. Clustering sounds great but being able to define a thesaurus scheme excatly would be good too. 2010/12/10 Péter Király <[email protected]> > Hi Lee, > > according to my vision the user could decide which relationship types > would he likes to attach to his search, and the application would call > his attention to other possibilities. So there would be no heuristic > method applied, because e.g. boarder terms would cause lots of > misleading results. > > Péter > > 2010/12/10 lee carroll <[email protected]>: > > Hi Peter, > > > > Thats way to clever for me :-) > > Discovering thesuarus relationships would be fantastic but its not clear > > what heuristics you would need to use to discover broader, narrower, > related > > documents etc. Although I might be doing the clustering down i'm > sceptical > > about the accuracy. > > > > cheers Lee c > > > > On 10 December 2010 09:38, Peter Sturge <[email protected]> wrote: > > > >> Hi Lee, > >> > >> Perhaps Solr's clustering component might be helpful for your use case? > >> http://wiki.apache.org/solr/ClusteringComponent > >> > >> > >> > >> > >> On Fri, Dec 10, 2010 at 9:17 AM, lee carroll > >> <[email protected]> wrote: > >> > Hi Chris, > >> > > >> > Its all a bit early in the morning for this mined :-) > >> > > >> > The question asked, in good faith, was does solr support or extend to > >> > implementing a thesaurus. It looks like it does not which is fine. It > >> does > >> > support synonyms and synonym rings which is again fine. The ski > example > >> was > >> > an illustration in response to a follow up question for more > explanation > >> on > >> > what a thesaurus is. > >> > > >> > An attempt at an answer of why a thesaurus; is below. > >> > > >> > Use case 1: improve facets > >> > > >> > Motivation > >> > Unstructured lists of labels in facets offer very poor user > experience. > >> > Similar to tag clouds users find them arbitrary, with out focus and > often > >> > overwhelming. Labels in facets which are grouped in meaningful ways > >> relevant > >> > to the user increase engagement, perceived relevance and user > >> satisfaction. > >> > > >> > Solution > >> > A thesaurus of term relationships could be used to group facet labels > >> > > >> > Implementation > >> > (er completely out of my depth at this point) > >> > Thesaurus relationships defined in a simple text file > >> > term, bt=>term,term nt=> term, term rt=>term, term, pt=>term > >> > if a search specifies a facet to be returned the field terms are > >> identified > >> > by reading the thesaurus into groups, broader terms, narrower terms, > >> related > >> > terms etc > >> > These groups are returned as part of the response for the UI to > display > >> > faceted labels as broader, narrower, related terms etc > >> > > >> > Use case 2: Increase synonym search precision > >> > > >> > Motivation > >> > Synonyms rings do not allow differences in synonym to be identified. > >> Rarely > >> > are synonyms exactly equivalent. This leads to a decrease in search > >> > precision. > >> > > >> > Solution > >> > Boost queries based on search term thesaurus relationships > >> > > >> > Implementation > >> > (again completely out of depth here) > >> > Allow terms in the index to be identified as bt , nt, .. terms of the > >> search > >> > term. Allow query parser to boost terms differentially based on these > >> > thesaurus relationships > >> > > >> > > >> > > >> > As for the x and y stuff I'm not sure, like i say its quite early in > the > >> > morning for me. I'm sure their may well be a different way of > achieving > >> the > >> > above (but note it is more than a hierarchy). However the librarians > have > >> > been doing this for 50 years now . > >> > > >> > Again though just to repeat this is hardly a killer for us. We've > looked > >> at > >> > solr for a project; created a proto type; generated tons of questions, > >> had > >> > them answered in the main by the docs, some on this list and been > amazed > >> at > >> > the fantastic results solr has given us. In fact with a combination of > >> > keepwords and synonyms we have got a pretty nice simple set of facet > >> labels > >> > anyway (my motivation for the original question), so our corpus at the > >> > moment does not really need a thesaurus! :-) > >> > > >> > Thanks Lee > >> > > >> > > >> > On 9 December 2010 23:38, Chris Hostetter <[email protected]> > >> wrote: > >> > > >> >> > >> >> > >> >> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many > >> >> Narrower > >> >> : Terms (NT) Related Terms (RT) etc > >> >> ... > >> >> : User supplied Term is say : Ski > >> >> : > >> >> : Prefered term: Skiing > >> >> : Broader terms could be : Ski and Snow Boarding, Mountain Sports, > >> Sports > >> >> : Narrower terms: down hill skiing, telemark, cross country > >> >> : Related terms: boarding, snow boarding, winter holidays > >> >> > >> >> I'm still lost. > >> >> > >> >> You've described a black box with some sample input ("Ski") and some > >> >> corrisponding sample output (PT=..., BT=..., NT=..., RT=....) -- but > you > >> >> haven't explained what you want to do with tht black box. Assuming > such > >> a > >> >> black box existed in solr what are you expecting/hoping to do with > it? > >> >> how would such a black box modify solr's user experience? what is > your > >> >> goal? > >> >> > >> >> Smells like an XY Problem... > >> >> http://people.apache.org/~hossman/#xyproblem<http://people.apache.org/%7Ehossman/#xyproblem> > <http://people.apache.org/%7Ehossman/#xyproblem> > >> <http://people.apache.org/%7Ehossman/#xyproblem> > >> >> > >> >> Your question appears to be an "XY Problem" ... that is: you are > dealing > >> >> with "X", you are assuming "Y" will help you, and you are asking > about > >> "Y" > >> >> without giving more details about the "X" so that we can understand > the > >> >> full issue. Perhaps the best solution doesn't involve "Y" at all? > >> >> See Also: http://www.perlmonks.org/index.pl?node_id=542341 > >> >> > >> >> > >> >> -Hoss > >> >> > >> > > >> > > >
