Re: [OPEN-ILS-GENERAL] Activity metric for relevance
All, I don't have much time to go into detail right now (meetings solid this afternoon), but in order to get the basics out there so Thomas and others can start looking at it I'm going paste some SQL below, and then give some basic narrative on how it could be generalized for use in a precalculated age scaling modification to various possible popularity metrics. The original is based on a direct rating table, where things to be rated are rated by users with values between, say, 1 and 5, and the time that each rating is recorded. More recent ratings are considered more important, and the age scaling horizon defines a linear regression of importance such that, given an aging scaling horizon of 30 days, a rating provided today is 30 times more important than a rating provided 30 days ago (or any older than that). In this scheme, all ratings are counted for all time, but cutting them off as some secondary, older age would be trivial. - CREATE TABLE rating ( id SERIAL PRIMARY KEY, usr INT NOT NULL REFERENCES actor.usr (id), thing INT NOT NULL REFERENCES thing (id), -- perhaps biblio.record_entry ... rating INT NOT NULL, created TIMESTAMP NOT NULL DEFAULT NOW(), commentsTEXT ); CREATE VIEW depository.derived_rating AS WITH setting(max_age) AS ( SELECT COALESCE(NULLIF(value,'0'),'1')::INT::TEXT AS max_age FROM config_flag WHERE name = 'rating.new_rating_bump.days' AND enabled LIMIT 1 ), duplicate_list(id,dup_count) AS ( SELECT id, setting.max_age::INT - DATE_PART('day', NOW() - created )::INT AS dup_count FROM rating, setting WHERE created NOW() - (setting.max_age || ' days')::interval UNION SELECT id, 1 AS dup_count FROM rating, setting WHERE created = NOW() - (setting.max_age || ' days')::interval ), sized_arrays(package,rating_array) AS ( SELECT package, ARRAY_FILL( rating, ARRAY_APPEND(NULL::INT[], duplicate_list.dup_count) ) AS rating_array FROM depository.rating JOIN duplicate_list USING (id) ), flattened_duplicated_ratings(package,rating) AS ( SELECT thing, UNNEST( rating_array ) AS rating FROM sized_arrays ) SELECT thing.id AS package, AVG( r.rating ) AS rating FROM thing AS LEFT JOIN flattened_duplicated_ratings AS r ON thing.id = r.package GROUP BY 1; -- This obviously does not work directly for things like holds or circs, where there is no inherent quality but simply existence. However, if we decided on a granularity -- days or weeks for holds, and weeks or months for circs, perhaps -- we can transform existence into quality. Consider hold count as a percentage of total holds per granularity interval as a normalizing factor: - CREATE VIEW daily_hold_popularity AS WITH bib_count_by_date(thing, count, created) AS ( SELECT rhrr.bib_record AS thing, COUNT(ahr.id) AS count, DATE(ahr.request_time) AS created FROM reporter.hold_request_record AS rhrr JOIN action.hold_request AS ahr ON (ahr.id = rhrr.id) GROUP BY 1,2 ), total_by_date(count, created) AS ( SELECT COUNT(id) AS count, DATE(ahr.request_time) AS created FROM action.hold_request AS ahr ), scaling_factor(value) AS ( SELECT 1 AS value ) SELECT bib_count_by_date.thing AS id, (( bib_count_by_date.count::DECIMAL / total_by_date.count) * scaling_factor.value) AS rating, bib_count_by_date.created FROM bib_count_by_date JOIN total_by_date USING (created), scaling_factor; - The scaling factor would allow one to change the range of possible values that holds use to contribute (the above would be 0-100), and could be pulled from a global flag. Use this view in the query above instead of the rating table and hold percentage by bib id would now be your rating value. Circs could be used in a similar fashion, as could acquisition date of copies (perhaps normalized to week-of-year, or month). This is, again, a pretty simple, linear age scaling algorithm, but exponential or even quadratic functions are possible, and possibly useful. Comments? Thoughts? --miker On Fri, Mar 15, 2013 at 1:41 PM, Mike Rylander mrylan...@gmail.com wrote: On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.org wrote: The current plan would not take into account how recent the circs (or holds) were, just that they were within a configurable time period of the time the cronjob that counts them last ran (default will likely be to include those from within the last 6 to 12
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
I'm very fond of the idea myself and shared it with a group of SCLENDS libraries a few weeks ago and a lot of ears perked up. On Tue, Mar 26, 2013 at 3:15 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, Thanks to everyone for their feedback to this project! Mike, we hadn't considered the aging parameters as you described it, but I think it's an excellent idea and it sounds like others agree. Let's all put our heads together to see if we can make it happen. This is what I love about this community! :) Cheers! Kathy Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative(508) 343-0128kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussier On 3/15/2013 1:41 PM, Mike Rylander wrote: On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.orgwrote: The current plan would not take into account how recent the circs (or holds) were, just that they were within a configurable time period of the time the cronjob that counts them last ran (default will likely be to include those from within the last 6 to 12 months). If you have an algorithm you think would work well and are willing to share we would gladly include that as an option when doing the work, though. I do, and I am. As time permits over the next few weeks I'll get back to this thread. We would not, however, be able to make it a per-bump option with the way we currently plan on storing the circ and hold counts, so instead it would function as an overall modifier to the circ/hold count numbers. Though even as I type this email I have thoughts on how we could change that if the feeling is that it should be at least partially bump-to-bump configurable. I think it's really only useful for some bump types in any case. The ratio bumps are really point-in-time values -- they represent right this very moment (or late last night, I guess). Threshold bumps don't attempt to take scale into account, just that some line was crossed. For circs this year or holds this month, or similar, age scaling (probably a better term than just aging) of each event's relevance should be useful. --miker Thomas Berezansky Merrimack Valley Library Consortium Quoting Mike Rylander mrylan...@gmail.com: Kathy, Have you considered allowing an aging parameter for some bumps, so that newer data toward the near end of the horizon is considered more important? For instance, spikes in circulation might have a larger short term effect on relevance, but over time, while still being factored into relevance, would be less important though still considered in the bump logic. I ask because I have a simple algorithm I'm using in another project, to be debuted at the conference, that may be portable to this work. --miker On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/**node/2757 http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 %28508%29%20343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/**kmlussier http://www.twitter.com/kmlussier -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com -- Mike Rylander | Director of
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
First off, I think this is a good enhancement and would be very helpful to all users particularly when searching a large database - a keyword search for Abraham Lincoln returns over 2400 hits when searching all of PINES.263 for one of the larger systems. Mike - would your algorithm mean that, with a title like Team of Rivals (which had an initial high rate of circulation and holds, then interest fell off to revive again with the movie), while it is first popular it rises in the search results, to fall as interest tapers, and then arises again, using older and newer interest data? That way it might bump above other titles that were popular in that interval more quickly than without the algorithm? Elaine _ J. Elaine Hardy PINES Bibliographic Projects Metadata Manager Georgia Public Library Service 1800 Century Place, Ste 150 Atlanta, Ga. 30345-4304 404.235-7128 404.235-7201, fax eha...@georgialibraries.org www.georgialibraries.org www.georgialibraries.org/pines From: open-ils-general-boun...@list.georgialibraries.org [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Mike Rylander Sent: Thursday, March 14, 2013 10:11 PM To: Evergreen Discussion Group Subject: Re: [OPEN-ILS-GENERAL] Activity metric for relevance Kathy, Have you considered allowing an aging parameter for some bumps, so that newer data toward the near end of the horizon is considered more important? For instance, spikes in circulation might have a larger short term effect on relevance, but over time, while still being factored into relevance, would be less important though still considered in the bump logic. I ask because I have a simple algorithm I'm using in another project, to be debuted at the conference, that may be portable to this work. --miker On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 tel:%28508%29%20343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussier -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
The current plan would not take into account how recent the circs (or holds) were, just that they were within a configurable time period of the time the cronjob that counts them last ran (default will likely be to include those from within the last 6 to 12 months). If you have an algorithm you think would work well and are willing to share we would gladly include that as an option when doing the work, though. We would not, however, be able to make it a per-bump option with the way we currently plan on storing the circ and hold counts, so instead it would function as an overall modifier to the circ/hold count numbers. Though even as I type this email I have thoughts on how we could change that if the feeling is that it should be at least partially bump-to-bump configurable. Thomas Berezansky Merrimack Valley Library Consortium Quoting Mike Rylander mrylan...@gmail.com: Kathy, Have you considered allowing an aging parameter for some bumps, so that newer data toward the near end of the horizon is considered more important? For instance, spikes in circulation might have a larger short term effect on relevance, but over time, while still being factored into relevance, would be less important though still considered in the bump logic. I ask because I have a simple algorithm I'm using in another project, to be debuted at the conference, that may be portable to this work. --miker On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/**node/2757http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/**kmlussierhttp://www.twitter.com/kmlussier -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
Something that took into consideration how current the circs/holds were could be valuable I think in reflecting a more ... natural effect that titles have in relevancy. For example, Gone Girl by Gillian Flynn. In the first 24 hours that the bib was in our system the total holds might be relatively small compared to some other titles during the last 6 months or a year. Now, by the end of a week the holds had skyrocketed and would show up well even in a search just for girl under the proposed approach. I think taking into consideration the timeliness of those weights would give a more natural reflection of the title rising to the top over the course of the week (and significantly higher that first day). On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.org wrote: The current plan would not take into account how recent the circs (or holds) were, just that they were within a configurable time period of the time the cronjob that counts them last ran (default will likely be to include those from within the last 6 to 12 months). If you have an algorithm you think would work well and are willing to share we would gladly include that as an option when doing the work, though. We would not, however, be able to make it a per-bump option with the way we currently plan on storing the circ and hold counts, so instead it would function as an overall modifier to the circ/hold count numbers. Though even as I type this email I have thoughts on how we could change that if the feeling is that it should be at least partially bump-to-bump configurable. Thomas Berezansky Merrimack Valley Library Consortium Quoting Mike Rylander mrylan...@gmail.com: Kathy, Have you considered allowing an aging parameter for some bumps, so that newer data toward the near end of the horizon is considered more important? For instance, spikes in circulation might have a larger short term effect on relevance, but over time, while still being factored into relevance, would be less important though still considered in the bump logic. I ask because I have a simple algorithm I'm using in another project, to be debuted at the conference, that may be portable to this work. --miker On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/node/2757http://masslnc.cwmars.org/**node/2757 http://masslnc.**cwmars.org/node/2757http://masslnc.cwmars.org/node/2757 . It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussierhttp://www.twitter.com/**kmlussier http://www.twitter.**com/kmlussier http://www.twitter.com/kmlussier -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com -- Rogan Hamby, MLS, CCNP, MIA Managers Headquarters Library and Reference Services, York County Library System You can never get a cup of tea large enough or a book long enough to suit me. -- C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
On Fri, Mar 15, 2013 at 9:00 AM, Hardy, Elaine eha...@georgialibraries.orgwrote: First off, I think this is a good enhancement and would be very helpful to all users particularly when searching a large database – a keyword search for Abraham Lincoln returns over 2400 hits when searching all of PINES…263 for one of the larger systems. ** ** Mike – would your algorithm mean that, with a title like Team of Rivals (which had an initial high rate of circulation and holds, then interest fell off to revive again with the movie), while it is first popular it rises in the search results, to fall as interest tapers, and then arises again, using older and newer interest data? That way it might “bump” above other titles that were popular in that interval more quickly than without the algorithm? ** That's the intent, yes. --miker ** *Elaine* -- J. Elaine Hardy PINES Bibliographic Projects Metadata Manager Georgia Public Library Service 1800 Century Place, Ste 150 Atlanta, Ga. 30345-4304 404.235-7128 404.235-7201, fax eha...@georgialibraries.org www.georgialibraries.org www.georgialibraries.org/pines *From:* open-ils-general-boun...@list.georgialibraries.org [mailto: open-ils-general-boun...@list.georgialibraries.org] *On Behalf Of *Mike Rylander *Sent:* Thursday, March 14, 2013 10:11 PM *To:* Evergreen Discussion Group *Subject:* Re: [OPEN-ILS-GENERAL] Activity metric for relevance ** ** Kathy, ** ** Have you considered allowing an aging parameter for some bumps, so that newer data toward the near end of the horizon is considered more important? For instance, spikes in circulation might have a larger short term effect on relevance, but over time, while still being factored into relevance, would be less important though still considered in the bump logic. I ask because I have a simple algorithm I'm using in another project, to be debuted at the conference, that may be portable to this work. ** ** --miker ** ** ** ** On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussier ** ** -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.org wrote: The current plan would not take into account how recent the circs (or holds) were, just that they were within a configurable time period of the time the cronjob that counts them last ran (default will likely be to include those from within the last 6 to 12 months). If you have an algorithm you think would work well and are willing to share we would gladly include that as an option when doing the work, though. I do, and I am. As time permits over the next few weeks I'll get back to this thread. We would not, however, be able to make it a per-bump option with the way we currently plan on storing the circ and hold counts, so instead it would function as an overall modifier to the circ/hold count numbers. Though even as I type this email I have thoughts on how we could change that if the feeling is that it should be at least partially bump-to-bump configurable. I think it's really only useful for some bump types in any case. The ratio bumps are really point-in-time values -- they represent right this very moment (or late last night, I guess). Threshold bumps don't attempt to take scale into account, just that some line was crossed. For circs this year or holds this month, or similar, age scaling (probably a better term than just aging) of each event's relevance should be useful. --miker Thomas Berezansky Merrimack Valley Library Consortium Quoting Mike Rylander mrylan...@gmail.com: Kathy, Have you considered allowing an aging parameter for some bumps, so that newer data toward the near end of the horizon is considered more important? For instance, spikes in circulation might have a larger short term effect on relevance, but over time, while still being factored into relevance, would be less important though still considered in the bump logic. I ask because I have a simple algorithm I'm using in another project, to be debuted at the conference, that may be portable to this work. --miker On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/node/2757http://masslnc.cwmars.org/**node/2757 http://masslnc.**cwmars.org/node/2757http://masslnc.cwmars.org/node/2757 . It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussierhttp://www.twitter.com/**kmlussier http://www.twitter.**com/kmlussier http://www.twitter.com/kmlussier -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com
[OPEN-ILS-GENERAL] Activity metric for relevance
Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussier
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
Am I correct in assuming that the bump amount is essentially the weight that bump will have when adding up the total effect of the bumps on that search? On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/**node/2757http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/**kmlussierhttp://www.twitter.com/kmlussier -- Rogan Hamby, MLS, CCNP, MIA Managers Headquarters Library and Reference Services, York County Library System You can never get a cup of tea large enough or a book long enough to suit me. -- C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
Hi Rogan, Yes, that's right. According to Thomas Berezansky, if you have other relevancy bumps active from search.relevance_adjustment, they will be combined, so it won't just be the activity metrics. Kathy Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussier On 3/14/2013 4:02 PM, Rogan Hamby wrote: Am I correct in assuming that the bump amount is essentially the weight that bump will have when adding up the total effect of the bumps on that search? On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org mailto:kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 tel:%28508%29%20343-0128 kluss...@masslnc.org mailto:kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussier -- Rogan Hamby, MLS, CCNP, MIA Managers Headquarters Library and Reference Services, York County Library System You can never get a cup of tea large enough or a book long enough to suit me. -- C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis
Re: [OPEN-ILS-GENERAL] Activity metric for relevance
Kathy, Have you considered allowing an aging parameter for some bumps, so that newer data toward the near end of the horizon is considered more important? For instance, spikes in circulation might have a larger short term effect on relevance, but over time, while still being factored into relevance, would be less important though still considered in the bump logic. I ask because I have a simple algorithm I'm using in another project, to be debuted at the conference, that may be portable to this work. --miker On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote: Hi all, MassLNC is working with our partners at MVLC to develop an activity metric (aka popularity metric) that will allow sites to rank more popular items a little higher in search results than items that don't see as much activity. I've raised this idea on the list before. Although Evergreen allows sites to adjust relevancy based on the appearance of keywords in certain fields, which is highly useful, our hope is that this additional functionality will lead to further improvement when ranking results by relevance. As an example, if a user were conducting a keyword search on abraham lincoln, there are many titles in most US libraries where the words abraham lincoln show up in the title. There would be no way to tease out the titles that are getting the most attention by readers. In fact, a title like Team of Rivals ranks very low in our search results even though there is a high likelihood it is the title the patron is seeking. By applying a metric based on activity, we might be able to see those more-recently popular titles floating higher in the search results list. I would like to share MVLC's proposal outlining the details for implementing this project. The proposal is available at http://masslnc.cwmars.org/**node/2757http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in allowing sites to define what high activity means to them. Circulation activity, holds activity, total copies, and publication age/bib record age can all be used as an activity metric. If you have any feedback or questions, feel free to let us know. Kathy -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/**kmlussierhttp://www.twitter.com/kmlussier -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com