How to efficiently get # of search results, per attribute

2004-11-13 Thread Chris Lamprecht
I'd like to implement a search across several types of entities,
let's say, classes, professors, and departments.  I want the user to
be able to enter a simple, single query and not have to specify what
they're looking for.  Then I want the search results to be something
like this:

Search results for: philosophy boyer

Found: 121 classes - 5 professors - 2 departments

search results here...


I know I could iterate through every hit returned and count them up
myself, but that seems inefficient if there are lots of results.  Is
there some other way to get this kind of information from the search
result set?  My other ideas are: doing a separate search each result
type, or storing different types in different indexes.  Any
suggestions?  Thanks for your help!

-Chris

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to efficiently get # of search results, per attribute

2004-11-13 Thread Nader Henein
It depends on how many results they're looking through, here are two 
scenarios I see:

1] If you don't have that many records you can fetch all the results and 
then do a post parsing step the determine totals

2] If you have a lot of entries in each category and you're worried 
about fetching thousands of records every time, you can just have 
seperate indecies per category and search them in in parallel (not 
Lucene Parallel Search) and you can get up to 100 hits for each one 
(efficiency) but you'll also have the total from the search to display.

Either way you can boost up speed using RamDirectory if you need more 
speed from the search, but whichever approach you choose I would 
recommend that you sit down and do some number crunching to figure out 
which way to go.

Hope this helps
Nader Henein

Chris Lamprecht wrote:
I'd like to implement a search across several types of entities,
let's say, classes, professors, and departments.  I want the user to
be able to enter a simple, single query and not have to specify what
they're looking for.  Then I want the search results to be something
like this:
Search results for: philosophy boyer
Found: 121 classes - 5 professors - 2 departments
search results here...
I know I could iterate through every hit returned and count them up
myself, but that seems inefficient if there are lots of results.  Is
there some other way to get this kind of information from the search
result set?  My other ideas are: doing a separate search each result
type, or storing different types in different indexes.  Any
suggestions?  Thanks for your help!
-Chris
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: How to efficiently get # of search results, per attribute

2004-11-13 Thread Chuck Williams
My Lucene application includes multi-faceted navigation that does a more
complex version of the below.  I've got 5 different taxonomies into
which every indexed item is classified.  The largest of the taxonomies
has over 15,000 entries while the other 4 are much smaller. For every
search query, I determine the best small set of nodes from each taxonomy
to present to the user as drill down options, and provide the counts
regarding how many results fall under each of these nodes.  At present I
only have about 25,000 indexed objects and usually no more than 1,000
results from the initial query.  To determine the drill-down options and
counts, I scan up to 1,000 results computing the counts for all nodes
into which these results classify.  Then for each taxonomy I pick the
best drill-down options available (orthogonal set with reasonable
branching factor that covers all results) and present them with their
counts.  If there are more than 1,000 results, I extrapolate the
computed counts to estimate the actual counts on the entire set of
results.  This is all done with a single index and a single search.

The total time required for performing this computation for the one
large taxonomy is under 10ms, running in full debug mode in my ide.  The
query response time overall is subjectively instantaneous at the UI
(Google-speed or better).  So, unless some dimension of the problem is
much bigger than mine, I doubt performance will be an issue.

Chuck

   -Original Message-
   From: Nader Henein [mailto:[EMAIL PROTECTED]
   Sent: Saturday, November 13, 2004 2:29 AM
   To: Lucene Users List
   Subject: Re: How to efficiently get # of search results, per
attribute
   
   It depends on how many results they're looking through, here are two
   scenarios I see:
   
   1] If you don't have that many records you can fetch all the results
and
   then do a post parsing step the determine totals
   
   2] If you have a lot of entries in each category and you're worried
   about fetching thousands of records every time, you can just have
   seperate indecies per category and search them in in parallel (not
   Lucene Parallel Search) and you can get up to 100 hits for each one
   (efficiency) but you'll also have the total from the search to
display.
   
   Either way you can boost up speed using RamDirectory if you need
more
   speed from the search, but whichever approach you choose I would
   recommend that you sit down and do some number crunching to figure
out
   which way to go.
   
   
   Hope this helps
   
   Nader Henein
   
   
   
   Chris Lamprecht wrote:
   
   I'd like to implement a search across several types of entities,
   let's say, classes, professors, and departments.  I want the user
to
   be able to enter a simple, single query and not have to specify
what
   they're looking for.  Then I want the search results to be
something
   like this:
   
   Search results for: philosophy boyer
   
   Found: 121 classes - 5 professors - 2 departments
   
   search results here...
   
   
   I know I could iterate through every hit returned and count them up
   myself, but that seems inefficient if there are lots of results.
Is
   there some other way to get this kind of information from the
search
   result set?  My other ideas are: doing a separate search each
result
   type, or storing different types in different indexes.  Any
   suggestions?  Thanks for your help!
   
   -Chris
   
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail:
[EMAIL PROTECTED]
   
   
   
   
   
   
   
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to efficiently get # of search results, per attribute

2004-11-13 Thread Chris Lamprecht
Nader and Chuck,

Thanks for the responses, they're both helpful.  My index sizes will
begin on the order of 200,000 classes, and 20,000 instructors (and
much fewer departments), and grow over time to maybe a few million
classes.  Compared to some of the numbers I've seen on this mailing
list, my dataset is fairly small.  I think I'll not worry about
performance for now, until  unless it becomes an issue.

-Chris

On Sat, 13 Nov 2004 15:36:11 -0800, Chuck Williams [EMAIL PROTECTED] wrote:
 My Lucene application includes multi-faceted navigation that does a more
 complex version of the below.  I've got 5 different taxonomies into
 which every indexed item is classified.  The largest of the taxonomies
 has over 15,000 entries while the other 4 are much smaller. For every
 search query, I determine the best small set of nodes from each taxonomy
 to present to the user as drill down options, and provide the counts
 regarding how many results fall under each of these nodes.  At present I
 only have about 25,000 indexed objects and usually no more than 1,000
 results from the initial query.  To determine the drill-down options and
 counts, I scan up to 1,000 results computing the counts for all nodes
 into which these results classify.  Then for each taxonomy I pick the
 best drill-down options available (orthogonal set with reasonable
 branching factor that covers all results) and present them with their
 counts.  If there are more than 1,000 results, I extrapolate the
 computed counts to estimate the actual counts on the entire set of
 results.  This is all done with a single index and a single search.
 
 The total time required for performing this computation for the one
 large taxonomy is under 10ms, running in full debug mode in my ide.  The
 query response time overall is subjectively instantaneous at the UI
 (Google-speed or better).  So, unless some dimension of the problem is
 much bigger than mine, I doubt performance will be an issue.
 
 Chuck
 
 
 
   -Original Message-
   From: Nader Henein [mailto:[EMAIL PROTECTED]
   Sent: Saturday, November 13, 2004 2:29 AM
   To: Lucene Users List
   Subject: Re: How to efficiently get # of search results, per
 attribute
  
   It depends on how many results they're looking through, here are two
   scenarios I see:
  
   1] If you don't have that many records you can fetch all the results
 and
   then do a post parsing step the determine totals
  
   2] If you have a lot of entries in each category and you're worried
   about fetching thousands of records every time, you can just have
   seperate indecies per category and search them in in parallel (not
   Lucene Parallel Search) and you can get up to 100 hits for each one
   (efficiency) but you'll also have the total from the search to
 display.
  
   Either way you can boost up speed using RamDirectory if you need
 more
   speed from the search, but whichever approach you choose I would
   recommend that you sit down and do some number crunching to figure
 out
   which way to go.
  
  
   Hope this helps
  
   Nader Henein
  
  
  
   Chris Lamprecht wrote:
  
   I'd like to implement a search across several types of entities,
   let's say, classes, professors, and departments.  I want the user
 to
   be able to enter a simple, single query and not have to specify
 what
   they're looking for.  Then I want the search results to be
 something
   like this:
   
   Search results for: philosophy boyer
   
   Found: 121 classes - 5 professors - 2 departments
   
   search results here...
   
   
   I know I could iterate through every hit returned and count them up
   myself, but that seems inefficient if there are lots of results.
 Is
   there some other way to get this kind of information from the
 search
   result set?  My other ideas are: doing a separate search each
 result
   type, or storing different types in different indexes.  Any
   suggestions?  Thanks for your help!
   
   -Chris
   
  
 -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail:
 [EMAIL PROTECTED]
   
   
   
   
   
   
  
  
 -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]