Re: multiple phrase search for topic

2011-11-02 Thread deb.lucene
Hi Ian, The other question I had was about the quality the results (especially in the top ranks). But then I utilized the "explain" functionality of Lucene and observed how the tf/idf parameters are functioning. I would be interested in seeing any work which modified the "similarity" function in

Re: multiple phrase search for topic

2011-10-31 Thread Ian Lea
Nice not to have to worry about performance. You say there is another question, but not what it is. The code you show looks like it should do what you want. For anything non-trivial I prefer to build the queries directly in code rather than concatenating strings to be parsed, because I find it h

Re: multiple phrase search for topic

2011-10-31 Thread deb.lucene
thanks Ian for your response. This is a one-time offline program so am not bothered about the performance (i.e. speed etc.). one more question, there are some situations where I need to run a AND clause (i.e. more than one phrase, such as "Apple" AND "Steve Jobs"). My approach was something like :

Re: multiple phrase search for topic

2011-10-28 Thread Ian Lea
Seems to me your approach should work, although I'd worry about performance. > A lot of top-ranked documents are not the best candidates for the "Software > Technology" topic, even > though they contain the phrases (not very frequent) Surely the docs that contain the phrases are going to be top