Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts
Agreed it is a problem. What MSSEs do (when operating this way) is make this issue a response time dependent one. Users themselves make it a Source dependent one (they only look at results from the sites they decide to search). Ranking algorithms make it an algorithm dependent one (their algorithm will determine what is top of the list). In all cases the results are vying for the few slots that the user will actually look at - "above the fold", first 3", "first page", etc. The problem is that all results cannot be first, and we do not have any way to insist the user look at all of them and make an informed selection. Anyway this can go all the way back to the collection policies of the library and the aggregators and even the cussedness of authors in not writing articles on exactly the right topic. (bad authors!) The MSEEs try to be even handed about it, but it doesn't always work. Possibly saving technologies here are text analysis and faceting. These can help take "horizontal slices" out of the vertically ordered list of results. That means the users can select another list which will be ordered a bit differently, and with text analysis and facets applied again, give them ways to slice and dice those results. But, in the end it requires enough interest from the user to do some refinement, and that battles with "good enough". Peter > -Original Message- > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of > Walker, David > Sent: Wednesday, May 19, 2010 1:18 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was > - OASIS SRU and CQL, access to most-current drafts > > > And if the majority of users are only looking at results > > from one resource... why do a broadcast multi-server > > search in the first place? > > More than just a theoretical concern. Consider this from an article by > Nina McHale: > > "[R]eference and instruction staff at Auraria were asked to draw up a > list of ten or so resources that would be included in a general-focus > “Quick Search” . . . [h]owever, in practice, the result was > disappointing. The results returned from the fastest resource were the > results on top of the pile, and of the twelve resources chosen, > PsycINFO routinely returned results first. Reference and instruction > staff rightly felt that this skewed the results for a general query." > [1] > > One library' perspective, and I'm pretty sure they were not using Muse. > But conceptually the concern would be the same. > > --Dave > > [1] http://webserviceslibrarian.blogspot.com/2009/01/why-reference-and- > instruction.html > > == > David Walker > Library Web Services Manager > California State University > http://xerxes.calstate.edu > > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of > Jonathan Rochkind [rochk...@jhu.edu] > Sent: Wednesday, May 19, 2010 12:45 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was > - OASIS SRU and CQL, access to most-current drafts > > Wait, but in the case you suspect is common, where you return results > as > soon as the first resource is returned, and subsequent results are > added > to the _end_ of the list > > I'm thinking that in most of these cases, the subsequent results will > be > several pages "in", and the user will never even get there. And if the > majority of users are only looking at results from one resource... why > do a broadcast multi-server search in the first place? > > Peter Noerr wrote: > > However things are a bit different now... At the risk of opening the > debate once more and lots of lengthy discussion let me say that our > experience (as one of the handful of commercial providers of "multi- > server search engines" (MSSEs? - it'll never stick, but I like it)) is: > > > > 1) Times are not slow for most installations as they are set by > default to provide incremental results in the fashion Jakub suggests > ("First In, First Displayed"). So users see results driven by the time > of the fastest Source, not the slowest. This > means that, on average, getting the results from a MSSE can be faster > than doing the same search on all of the native sites (just talking > response times here, not the fact it is one search versus N). Do the > maths - it's quite fun. > > > > 2) The average "delay" for just processing the results through modern > MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra > network hops and the additional respons
Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts
> And if the majority of users are only looking at results > from one resource... why do a broadcast multi-server > search in the first place? More than just a theoretical concern. Consider this from an article by Nina McHale: "[R]eference and instruction staff at Auraria were asked to draw up a list of ten or so resources that would be included in a general-focus “Quick Search” . . . [h]owever, in practice, the result was disappointing. The results returned from the fastest resource were the results on top of the pile, and of the twelve resources chosen, PsycINFO routinely returned results first. Reference and instruction staff rightly felt that this skewed the results for a general query." [1] One library' perspective, and I'm pretty sure they were not using Muse. But conceptually the concern would be the same. --Dave [1] http://webserviceslibrarian.blogspot.com/2009/01/why-reference-and-instruction.html == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind [rochk...@jhu.edu] Sent: Wednesday, May 19, 2010 12:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts Wait, but in the case you suspect is common, where you return results as soon as the first resource is returned, and subsequent results are added to the _end_ of the list I'm thinking that in most of these cases, the subsequent results will be several pages "in", and the user will never even get there. And if the majority of users are only looking at results from one resource... why do a broadcast multi-server search in the first place? Peter Noerr wrote: > However things are a bit different now... At the risk of opening the debate > once more and lots of lengthy discussion let me say that our experience (as > one of the handful of commercial providers of "multi-server search engines" > (MSSEs? - it'll never stick, but I like it)) is: > > 1) Times are not slow for most installations as they are set by default to > provide incremental results in the fashion Jakub suggests ("First In, First > Displayed"). So users see results driven by the time of the fastest Source, > not the slowest. This means that, on average, getting > the results from a MSSE can be faster than doing the same search on all of > the native sites (just talking response times here, not the fact it is one > search versus N). Do the maths - it's quite fun. > > 2) The average "delay" for just processing the results through modern MSSEs > is about 0.5 sec. Add to this say another 0.2 for two extra network hops and > the additional response time to first display is about 3/4 of a second. This > is a time shift all the way down the set of results - most of which the user > isn't aware of as they are beyond the first 10 on screen, and the system > allows interaction with those 10 while the rest are getting their act > together. So, under 1 second is added to response times which average about 5 > seconds. Of course, waiting for all the results adds this time to the slowest > results. > > 3) Most users seem happy to get things back faster and not worry too much > about relevance ranking. To combat the response time issue for users who > require ranked results, the incremental return can be set to show interfiled > results as the later records come in and rank within the ones displayed to > the user. This can be disconcerting, but making sure the UI doesn't lose > track of the user's focus is helpful. Another option is to show that "new > results" are available, and let the user manually click to get them > incorporated - less intrusive, but an extra click! > > General experience with the incremental displays shows that users are > happiest with them when there is an obvious and clear reason for the new > additions. The most accepted case is where the ranking criterion is price, > and the user is always happy to see a cheaper item arrive. It really doesn't > work well for titles sorted alphabetically - unless the user is looking for a > specific title which should occur at the beginning of the list. And these > examples illustrate the general point - that if the user is focused on > specific items at the top of the list, then they are generally happy with an > updating list, if they are more in "browse" mode, then the distraction of the > updating list is just that - a distraction, if it is on screen. > > Overall our experience from our partner's users is that they would rather see >
Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts
Aha, but we get interleaved results from the different Sources. So the results are not "all A", "all B", "all... Even if the results come as complete "sets of 10", we internally collect them asynchronously as they are processed. The number of buffers and processing stages is quite large, so the parallel processing nature of multi-tasking means that the results get interleaved. It is still possible that one set of results comes in so far in advance of everything else that it is completely processed before anything else arrives, then the display is "all A", "others". However the major benefit is that the results from all the Sources are there at once, so even if the user uses the system to "skip" from Source to Source, it is quicker than running the search on all the Sources individually. And, yes, you can individually save "a few here", "one or two there" to make your combined chosen few. But, first page only viewing does mean that the fastest Sources get the best spots. Is this an incentive to speed up the search systems? (Actually it has happened that a couple of the Sources who we showed comparative response time to, did use the figures to get funds for hardware replacement.) Peter > -Original Message- > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of > Jonathan Rochkind > Sent: Wednesday, May 19, 2010 12:45 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was > - OASIS SRU and CQL, access to most-current drafts > > Wait, but in the case you suspect is common, where you return results > as > soon as the first resource is returned, and subsequent results are > added > to the _end_ of the list > > I'm thinking that in most of these cases, the subsequent results will > be > several pages "in", and the user will never even get there. And if the > majority of users are only looking at results from one resource... why > do a broadcast multi-server search in the first place? > > Peter Noerr wrote: > > However things are a bit different now... At the risk of opening the > debate once more and lots of lengthy discussion let me say that our > experience (as one of the handful of commercial providers of "multi- > server search engines" (MSSEs? - it'll never stick, but I like it)) is: > > > > 1) Times are not slow for most installations as they are set by > default to provide incremental results in the fashion Jakub suggests > ("First In, First Displayed"). So users see results driven by the time > of the fastest Source, not the slowest. This > means that, on average, getting the results from a MSSE can be faster > than doing the same search on all of the native sites (just talking > response times here, not the fact it is one search versus N). Do the > maths - it's quite fun. > > > > 2) The average "delay" for just processing the results through modern > MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra > network hops and the additional response time to first display is about > 3/4 of a second. This is a time shift all the way down the set of > results - most of which the user isn't aware of as they are beyond the > first 10 on screen, and the system allows interaction with those 10 > while the rest are getting their act together. So, under 1 second is > added to response times which average about 5 seconds. Of course, > waiting for all the results adds this time to the slowest results. > > > > 3) Most users seem happy to get things back faster and not worry too > much about relevance ranking. To combat the response time issue for > users who require ranked results, the incremental return can be set to > show interfiled results as the later records come in and rank within > the ones displayed to the user. This can be disconcerting, but making > sure the UI doesn't lose track of the user's focus is helpful. Another > option is to show that "new results" are available, and let the user > manually click to get them incorporated - less intrusive, but an extra > click! > > > > General experience with the incremental displays shows that users are > happiest with them when there is an obvious and clear reason for the > new additions. The most accepted case is where the ranking criterion is > price, and the user is always happy to see a cheaper item arrive. It > really doesn't work well for titles sorted alphabetically - unless the > user is looking for a specific title which should occur at the > beginning of the list. And these examples illustrate the general point > - that if the user is focused on specific items
Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts
Wait, but in the case you suspect is common, where you return results as soon as the first resource is returned, and subsequent results are added to the _end_ of the list I'm thinking that in most of these cases, the subsequent results will be several pages "in", and the user will never even get there. And if the majority of users are only looking at results from one resource... why do a broadcast multi-server search in the first place? Peter Noerr wrote: However things are a bit different now... At the risk of opening the debate once more and lots of lengthy discussion let me say that our experience (as one of the handful of commercial providers of "multi-server search engines" (MSSEs? - it'll never stick, but I like it)) is: 1) Times are not slow for most installations as they are set by default to provide incremental results in the fashion Jakub suggests ("First In, First Displayed"). So users see results driven by the time of the fastest Source, not the slowest. This means that, on average, getting the results from a MSSE can be faster than doing the same search on all of the native sites (just talking response times here, not the fact it is one search versus N). Do the maths - it's quite fun. 2) The average "delay" for just processing the results through modern MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra network hops and the additional response time to first display is about 3/4 of a second. This is a time shift all the way down the set of results - most of which the user isn't aware of as they are beyond the first 10 on screen, and the system allows interaction with those 10 while the rest are getting their act together. So, under 1 second is added to response times which average about 5 seconds. Of course, waiting for all the results adds this time to the slowest results. 3) Most users seem happy to get things back faster and not worry too much about relevance ranking. To combat the response time issue for users who require ranked results, the incremental return can be set to show interfiled results as the later records come in and rank within the ones displayed to the user. This can be disconcerting, but making sure the UI doesn't lose track of the user's focus is helpful. Another option is to show that "new results" are available, and let the user manually click to get them incorporated - less intrusive, but an extra click! General experience with the incremental displays shows that users are happiest with them when there is an obvious and clear reason for the new additions. The most accepted case is where the ranking criterion is price, and the user is always happy to see a cheaper item arrive. It really doesn't work well for titles sorted alphabetically - unless the user is looking for a specific title which should occur at the beginning of the list. And these examples illustrate the general point - that if the user is focused on specific items at the top of the list, then they are generally happy with an updating list, if they are more in "browse" mode, then the distraction of the updating list is just that - a distraction, if it is on screen. Overall our experience from our partner's users is that they would rather see things quickly than wait for relevance ranking. I suspect partly (can of worms coming) because the existing ranking schemes don't make a lot of difference (ducks quickly). Peter Peter Noerr CTO, Museglobal www.museglobal.com -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walker, David Sent: Tuesday, May 18, 2010 12:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts in order to provide decent user experience you need to be able to present some results "sooner" than others. I would actually question whether this is really necessary. A few years back, I did a big literature review on metasearch, as well as a looked at a good number of usability studies that libraries did with metasearch systems. One thing that stood to me out was that the literature (written by librarians and technologists) was very concerned about the slow search times of metasearch, often seeing it as a deal-breaker. And yet, in the usability studies, actual students and faculty were far less concerned about the search times -- within reason, of course. I thought the UC Santa Cruz study [1] summarized the point well: "Users are willing to wait as long as they think that they will get useful results. Their perceptions of time depend on this belief." Trying to return the results of a metasearch quickly just for the sake of returning them quickly I think introduces other problems (in terms of relevance ranking and presentation) that do far more to negatively impact the user experience. Just my opinion, of course. --Dave [1] http://www.cdlib.org/services/d2d/metasearch/docs/core_ucsc_oct2004usab ility.pdf ==
[CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts
However things are a bit different now... At the risk of opening the debate once more and lots of lengthy discussion let me say that our experience (as one of the handful of commercial providers of "multi-server search engines" (MSSEs? - it'll never stick, but I like it)) is: 1) Times are not slow for most installations as they are set by default to provide incremental results in the fashion Jakub suggests ("First In, First Displayed"). So users see results driven by the time of the fastest Source, not the slowest. This means that, on average, getting the results from a MSSE can be faster than doing the same search on all of the native sites (just talking response times here, not the fact it is one search versus N). Do the maths - it's quite fun. 2) The average "delay" for just processing the results through modern MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra network hops and the additional response time to first display is about 3/4 of a second. This is a time shift all the way down the set of results - most of which the user isn't aware of as they are beyond the first 10 on screen, and the system allows interaction with those 10 while the rest are getting their act together. So, under 1 second is added to response times which average about 5 seconds. Of course, waiting for all the results adds this time to the slowest results. 3) Most users seem happy to get things back faster and not worry too much about relevance ranking. To combat the response time issue for users who require ranked results, the incremental return can be set to show interfiled results as the later records come in and rank within the ones displayed to the user. This can be disconcerting, but making sure the UI doesn't lose track of the user's focus is helpful. Another option is to show that "new results" are available, and let the user manually click to get them incorporated - less intrusive, but an extra click! General experience with the incremental displays shows that users are happiest with them when there is an obvious and clear reason for the new additions. The most accepted case is where the ranking criterion is price, and the user is always happy to see a cheaper item arrive. It really doesn't work well for titles sorted alphabetically - unless the user is looking for a specific title which should occur at the beginning of the list. And these examples illustrate the general point - that if the user is focused on specific items at the top of the list, then they are generally happy with an updating list, if they are more in "browse" mode, then the distraction of the updating list is just that - a distraction, if it is on screen. Overall our experience from our partner's users is that they would rather see things quickly than wait for relevance ranking. I suspect partly (can of worms coming) because the existing ranking schemes don't make a lot of difference (ducks quickly). Peter Peter Noerr CTO, Museglobal www.museglobal.com > -Original Message- > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of > Walker, David > Sent: Tuesday, May 18, 2010 12:44 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current > drafts > > > in order to provide decent user experience you need to be > > able to present some results "sooner" than others. > > I would actually question whether this is really necessary. > > A few years back, I did a big literature review on metasearch, as well > as a looked at a good number of usability studies that libraries did > with metasearch systems. > > One thing that stood to me out was that the literature (written by > librarians and technologists) was very concerned about the slow search > times of metasearch, often seeing it as a deal-breaker. > > And yet, in the usability studies, actual students and faculty were far > less concerned about the search times -- within reason, of course. > > I thought the UC Santa Cruz study [1] summarized the point well: "Users > are willing to wait as long as they think that they will get useful > results. Their perceptions of time depend on this belief." > > Trying to return the results of a metasearch quickly just for the sake > of returning them quickly I think introduces other problems (in terms > of relevance ranking and presentation) that do far more to negatively > impact the user experience. Just my opinion, of course. > > --Dave > > [1] > http://www.cdlib.org/services/d2d/metasearch/docs/core_ucsc_oct2004usab > ility.pdf > > == > David Walker > Library Web Services Manager > California State University > http://xerxes.calstate.edu > > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Kuba > [skoc...@gmail.com] > Sent: Tuesday, May 18, 2010 9:57 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current > dra