Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-06-14 Thread Mike Rylander
All,

I don't have much time to go into detail right now (meetings solid
this afternoon), but in order to get the basics out there so Thomas
and others can start looking at it I'm going paste some SQL below, and
then give some basic narrative on how it could be generalized for use
in a precalculated age scaling modification to various possible
popularity metrics.

The original is based on a direct rating table, where things to be
rated are rated by users with values between, say, 1 and 5, and the
time that each rating is recorded.  More recent ratings are considered
more important, and the age scaling horizon defines a linear
regression of importance such that, given an aging scaling horizon of
30 days, a rating provided today is 30 times more important than a
rating provided 30 days ago (or any older than that).  In this scheme,
all ratings are counted for all time, but cutting them off as some
secondary, older age would be trivial.

-

CREATE TABLE rating (
id  SERIAL  PRIMARY KEY,
usr INT NOT NULL REFERENCES actor.usr (id),
thing INT NOT NULL REFERENCES thing (id), -- perhaps
biblio.record_entry ...
rating  INT NOT NULL,
created TIMESTAMP   NOT NULL DEFAULT NOW(),
commentsTEXT
);

CREATE VIEW depository.derived_rating AS
WITH setting(max_age) AS (
SELECT  COALESCE(NULLIF(value,'0'),'1')::INT::TEXT AS max_age
  FROM  config_flag
  WHERE name = 'rating.new_rating_bump.days'
AND enabled
  LIMIT 1
), duplicate_list(id,dup_count) AS (
SELECT  id,
setting.max_age::INT - DATE_PART('day', NOW() -
created )::INT AS dup_count
  FROM  rating, setting
  WHERE created  NOW() - (setting.max_age || ' days')::interval
UNION
SELECT  id,
1 AS dup_count
  FROM  rating, setting
  WHERE created = NOW() - (setting.max_age || ' days')::interval
), sized_arrays(package,rating_array) AS (
SELECT  package,
ARRAY_FILL( rating, ARRAY_APPEND(NULL::INT[],
duplicate_list.dup_count) ) AS rating_array
  FROM  depository.rating
JOIN duplicate_list USING (id)
), flattened_duplicated_ratings(package,rating) AS (
SELECT  thing, UNNEST( rating_array ) AS rating
  FROM  sized_arrays
)
SELECT  thing.id AS package, AVG( r.rating ) AS rating
  FROM  thing AS
LEFT JOIN flattened_duplicated_ratings AS r ON thing.id = r.package
  GROUP BY 1;
--

This obviously does not work directly for things like holds or circs,
where there is no inherent quality but simply existence.  However,
if we decided on a granularity -- days or weeks for holds, and weeks
or months for circs, perhaps -- we can transform existence into
quality.  Consider hold count as a percentage of total holds per
granularity interval as a normalizing factor:

-

CREATE VIEW daily_hold_popularity AS
  WITH
bib_count_by_date(thing, count, created) AS (
  SELECT rhrr.bib_record AS thing, COUNT(ahr.id) AS count,
DATE(ahr.request_time) AS created
FROM reporter.hold_request_record AS rhrr
  JOIN action.hold_request AS ahr ON (ahr.id = rhrr.id)
GROUP BY 1,2
 ),
 total_by_date(count, created) AS (
   SELECT COUNT(id) AS count, DATE(ahr.request_time) AS created
FROM action.hold_request AS ahr
 ),
 scaling_factor(value) AS ( SELECT 1 AS value )
   SELECT bib_count_by_date.thing AS id,
 (( bib_count_by_date.count::DECIMAL /
total_by_date.count) * scaling_factor.value) AS rating,
 bib_count_by_date.created
  FROM bib_count_by_date JOIN total_by_date USING (created), scaling_factor;

-

The scaling factor would allow one to change the range of possible
values that holds use to contribute (the above would be 0-100), and
could be pulled from a global flag.  Use this view in the query above
instead of the rating table and hold percentage by bib id would now
be your rating value.

Circs could be used in a similar fashion, as could acquisition date of
copies (perhaps normalized to week-of-year, or month).

This is, again, a pretty simple, linear age scaling algorithm, but
exponential or even quadratic functions are possible, and possibly
useful.

Comments? Thoughts?

--miker


On Fri, Mar 15, 2013 at 1:41 PM, Mike Rylander mrylan...@gmail.com wrote:

 On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.org wrote:

 The current plan would not take into account how recent the circs (or
 holds) were, just that they were within a configurable time period of the
 time the cronjob that counts them last ran (default will likely be to
 include those from within the last 6 to 12 

Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-26 Thread Rogan Hamby
I'm very fond of the idea myself and shared it with a group of SCLENDS
libraries a few weeks ago and a lot of ears perked up.


On Tue, Mar 26, 2013 at 3:15 PM, Kathy Lussier kluss...@masslnc.org wrote:

  Hi all,

 Thanks to everyone for their feedback to this project! Mike, we hadn't
 considered the aging parameters as you described it, but I think it's an
 excellent idea and it sounds like others agree. Let's all put our heads
 together to see if we can make it happen. This is what I love about this
 community! :)

 Cheers!
 Kathy

 Kathy Lussier
 Project Coordinator
 Massachusetts Library Network Cooperative(508) 343-0128kluss...@masslnc.org

 Twitter: http://www.twitter.com/kmlussier

 On 3/15/2013 1:41 PM, Mike Rylander wrote:


  On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.orgwrote:

 The current plan would not take into account how recent the circs (or
 holds) were, just that they were within a configurable time period of the
 time the cronjob that counts them last ran (default will likely be to
 include those from within the last 6 to 12 months). If you have an
 algorithm you think would work well and are willing to share we would
 gladly include that as an option when doing the work, though.


  I do, and I am.  As time permits over the next few weeks I'll get back
 to this thread.


 We would not, however, be able to make it a per-bump option with the way
 we currently plan on storing the circ and hold counts, so instead it would
 function as an overall modifier to the circ/hold count numbers. Though even
 as I type this email I have thoughts on how we could change that if the
 feeling is that it should be at least partially bump-to-bump configurable.


  I think it's really only useful for some bump types in any case.  The
 ratio bumps are really point-in-time values -- they represent right this
 very moment (or late last night, I guess). Threshold bumps don't attempt
 to take scale into account, just that some line was crossed.  For circs
 this year or holds this month, or similar, age scaling (probably a better
 term than just aging) of each event's relevance should be useful.

  --miker


 Thomas Berezansky
 Merrimack Valley Library Consortium


 Quoting Mike Rylander mrylan...@gmail.com:

   Kathy,

 Have you considered allowing an aging parameter for some bumps, so that
 newer data toward the near end of the horizon is considered more
 important?
 For instance, spikes in circulation might have a larger short term effect
 on relevance, but over time, while still being factored into relevance,
 would be less important though still considered in the bump logic.  I ask
 because I have a simple algorithm I'm using in another project, to be
 debuted at the conference, that may be portable to this work.

 --miker



 On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org
 wrote:

Hi all,

 MassLNC is working with our partners at MVLC to develop an activity
 metric
 (aka popularity metric) that will allow sites to rank more popular
 items a
 little higher in search results than items that don't see as much
 activity.
 I've raised this idea on the list before. Although Evergreen allows
 sites
 to adjust relevancy based on the appearance of keywords in certain
 fields,
 which is highly useful, our hope is that this additional functionality
 will
 lead to further improvement when ranking results by relevance.

 As an example, if a user were conducting a keyword search on abraham
 lincoln,  there are many titles in most US libraries where the words
 abraham lincoln show up in the title. There would be no way to tease
 out
 the titles that are getting the most attention by readers. In fact, a
 title
 like Team of Rivals ranks very low in our search results even though
 there is a high likelihood it is the title the patron is seeking.  By
 applying a metric based on activity, we might be able to see those
 more-recently popular titles floating higher in the search results list.

 I would like to share MVLC's proposal outlining the details for
 implementing this project. The proposal is available at
  http://masslnc.cwmars.org/**node/2757
 http://masslnc.cwmars.org/node/2757.

 It provides a lot of flexibility in allowing sites to define what high
 activity means to them. Circulation activity, holds activity, total
 copies, and publication age/bib record age can all be used as an
 activity
 metric.

 If you have any feedback or questions, feel free to let us know.

 Kathy

 --
 Kathy Lussier
 Project Coordinator
 Massachusetts Library Network Cooperative
 (508) 343-0128 %28508%29%20343-0128
 kluss...@masslnc.org
  Twitter: http://www.twitter.com/**kmlussier
 http://www.twitter.com/kmlussier




 --
 Mike Rylander
  | Director of Research and Development
  | Equinox Software, Inc. / Your Library's Guide to Open Source
  | phone:  1-877-OPEN-ILS (673-6457)
  | email:  mi...@esilibrary.com
  | web:  http://www.esilibrary.com






  --
 Mike Rylander
  | Director of 

Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-15 Thread Hardy, Elaine
First off, I think this is a good enhancement and would be very helpful to
all users particularly when searching a large database - a keyword search
for Abraham Lincoln returns over 2400 hits when searching all of PINES.263
for one of the larger systems. 

 

Mike - would your algorithm mean that, with a title like Team of Rivals
(which had an initial high rate of circulation and holds, then interest
fell off to revive again with the movie), while it is first popular it
rises in the search results, to fall as interest tapers, and then arises
again, using older and newer interest data? That way it might bump above
other titles that were popular in that interval more quickly than without
the algorithm?

 

Elaine

  _  


J. Elaine Hardy
PINES Bibliographic Projects  Metadata Manager
Georgia Public Library Service
1800 Century Place, Ste 150
Atlanta, Ga. 30345-4304

404.235-7128
404.235-7201, fax
eha...@georgialibraries.org
www.georgialibraries.org
www.georgialibraries.org/pines




From: open-ils-general-boun...@list.georgialibraries.org
[mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of
Mike Rylander
Sent: Thursday, March 14, 2013 10:11 PM
To: Evergreen Discussion Group
Subject: Re: [OPEN-ILS-GENERAL] Activity metric for relevance

 

Kathy,

 

Have you considered allowing an aging parameter for some bumps, so that
newer data toward the near end of the horizon is considered more
important? For instance, spikes in circulation might have a larger short
term effect on relevance, but over time, while still being factored into
relevance, would be less important though still considered in the bump
logic.  I ask because I have a simple algorithm I'm using in another
project, to be debuted at the conference, that may be portable to this
work.

 

--miker

 

 

On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org
wrote:

Hi all,

MassLNC is working with our partners at MVLC to develop an activity metric
(aka popularity metric) that will allow sites to rank more popular items a
little higher in search results than items that don't see as much
activity. I've raised this idea on the list before. Although Evergreen
allows sites to adjust relevancy based on the appearance of keywords in
certain fields, which is highly useful, our hope is that this additional
functionality will lead to further improvement when ranking results by
relevance.

As an example, if a user were conducting a keyword search on abraham
lincoln,  there are many titles in most US libraries where the words
abraham lincoln show up in the title. There would be no way to tease out
the titles that are getting the most attention by readers. In fact, a
title like Team of Rivals ranks very low in our search results even
though there is a high likelihood it is the title the patron is seeking.
By applying a metric based on activity, we might be able to see those
more-recently popular titles floating higher in the search results list.

I would like to share MVLC's proposal outlining the details for
implementing this project. The proposal is available at
http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in
allowing sites to define what high activity means to them. Circulation
activity, holds activity, total copies, and publication age/bib record age
can all be used as an activity metric.

If you have any feedback or questions, feel free to let us know.

Kathy

-- 
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128 tel:%28508%29%20343-0128 
kluss...@masslnc.org
Twitter: http://www.twitter.com/kmlussier





 

-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com 



Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-15 Thread Thomas Berezansky
The current plan would not take into account how recent the circs (or  
holds) were, just that they were within a configurable time period of  
the time the cronjob that counts them last ran (default will likely be  
to include those from within the last 6 to 12 months). If you have an  
algorithm you think would work well and are willing to share we would  
gladly include that as an option when doing the work, though.


We would not, however, be able to make it a per-bump option with the  
way we currently plan on storing the circ and hold counts, so instead  
it would function as an overall modifier to the circ/hold count  
numbers. Though even as I type this email I have thoughts on how we  
could change that if the feeling is that it should be at least  
partially bump-to-bump configurable.


Thomas Berezansky
Merrimack Valley Library Consortium


Quoting Mike Rylander mrylan...@gmail.com:


Kathy,

Have you considered allowing an aging parameter for some bumps, so that
newer data toward the near end of the horizon is considered more important?
For instance, spikes in circulation might have a larger short term effect
on relevance, but over time, while still being factored into relevance,
would be less important though still considered in the bump logic.  I ask
because I have a simple algorithm I'm using in another project, to be
debuted at the conference, that may be portable to this work.

--miker



On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote:


Hi all,

MassLNC is working with our partners at MVLC to develop an activity metric
(aka popularity metric) that will allow sites to rank more popular items a
little higher in search results than items that don't see as much activity.
I've raised this idea on the list before. Although Evergreen allows sites
to adjust relevancy based on the appearance of keywords in certain fields,
which is highly useful, our hope is that this additional functionality will
lead to further improvement when ranking results by relevance.

As an example, if a user were conducting a keyword search on abraham
lincoln,  there are many titles in most US libraries where the words
abraham lincoln show up in the title. There would be no way to tease out
the titles that are getting the most attention by readers. In fact, a title
like Team of Rivals ranks very low in our search results even though
there is a high likelihood it is the title the patron is seeking.  By
applying a metric based on activity, we might be able to see those
more-recently popular titles floating higher in the search results list.

I would like to share MVLC's proposal outlining the details for
implementing this project. The proposal is available at
http://masslnc.cwmars.org/**node/2757http://masslnc.cwmars.org/node/2757.
It provides a lot of flexibility in allowing sites to define what high
activity means to them. Circulation activity, holds activity, total
copies, and publication age/bib record age can all be used as an activity
metric.

If you have any feedback or questions, feel free to let us know.

Kathy

--
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
kluss...@masslnc.org
Twitter:  
http://www.twitter.com/**kmlussierhttp://www.twitter.com/kmlussier






--
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com






Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-15 Thread Rogan Hamby
Something that took into consideration how current the circs/holds were
could be valuable I think in reflecting a more ... natural effect that
titles have in relevancy.

For example, Gone Girl by Gillian Flynn.  In the first 24 hours that the
bib was in our system the total holds might be relatively small compared to
some other titles during the last 6 months or a year.  Now, by the end of a
week the holds had skyrocketed and would show up well even in a search just
for girl under the proposed approach.

I think taking into consideration the timeliness of those weights would
give a more natural reflection of the title rising to the top over the
course of the week (and significantly higher that first day).




On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.org wrote:

 The current plan would not take into account how recent the circs (or
 holds) were, just that they were within a configurable time period of the
 time the cronjob that counts them last ran (default will likely be to
 include those from within the last 6 to 12 months). If you have an
 algorithm you think would work well and are willing to share we would
 gladly include that as an option when doing the work, though.

 We would not, however, be able to make it a per-bump option with the way
 we currently plan on storing the circ and hold counts, so instead it would
 function as an overall modifier to the circ/hold count numbers. Though even
 as I type this email I have thoughts on how we could change that if the
 feeling is that it should be at least partially bump-to-bump configurable.

 Thomas Berezansky
 Merrimack Valley Library Consortium


 Quoting Mike Rylander mrylan...@gmail.com:

  Kathy,

 Have you considered allowing an aging parameter for some bumps, so that
 newer data toward the near end of the horizon is considered more
 important?
 For instance, spikes in circulation might have a larger short term effect
 on relevance, but over time, while still being factored into relevance,
 would be less important though still considered in the bump logic.  I ask
 because I have a simple algorithm I'm using in another project, to be
 debuted at the conference, that may be portable to this work.

 --miker



 On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org
 wrote:

  Hi all,

 MassLNC is working with our partners at MVLC to develop an activity
 metric
 (aka popularity metric) that will allow sites to rank more popular items
 a
 little higher in search results than items that don't see as much
 activity.
 I've raised this idea on the list before. Although Evergreen allows sites
 to adjust relevancy based on the appearance of keywords in certain
 fields,
 which is highly useful, our hope is that this additional functionality
 will
 lead to further improvement when ranking results by relevance.

 As an example, if a user were conducting a keyword search on abraham
 lincoln,  there are many titles in most US libraries where the words
 abraham lincoln show up in the title. There would be no way to tease
 out
 the titles that are getting the most attention by readers. In fact, a
 title
 like Team of Rivals ranks very low in our search results even though
 there is a high likelihood it is the title the patron is seeking.  By
 applying a metric based on activity, we might be able to see those
 more-recently popular titles floating higher in the search results list.

 I would like to share MVLC's proposal outlining the details for
 implementing this project. The proposal is available at
 http://masslnc.cwmars.org/node/2757http://masslnc.cwmars.org/**node/2757
 http://masslnc.**cwmars.org/node/2757http://masslnc.cwmars.org/node/2757
 .

 It provides a lot of flexibility in allowing sites to define what high
 activity means to them. Circulation activity, holds activity, total
 copies, and publication age/bib record age can all be used as an activity
 metric.

 If you have any feedback or questions, feel free to let us know.

 Kathy

 --
 Kathy Lussier
 Project Coordinator
 Massachusetts Library Network Cooperative
 (508) 343-0128
 kluss...@masslnc.org
 Twitter: 
 http://www.twitter.com/kmlussierhttp://www.twitter.com/**kmlussier
 http://www.twitter.**com/kmlussier http://www.twitter.com/kmlussier




 --
 Mike Rylander
  | Director of Research and Development
  | Equinox Software, Inc. / Your Library's Guide to Open Source
  | phone:  1-877-OPEN-ILS (673-6457)
  | email:  mi...@esilibrary.com
  | web:  http://www.esilibrary.com






-- 

Rogan Hamby, MLS, CCNP, MIA
Managers Headquarters Library and Reference Services,
York County Library System

You can never get a cup of tea large enough or a book long enough to suit
me.
-- C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis


Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-15 Thread Mike Rylander
On Fri, Mar 15, 2013 at 9:00 AM, Hardy, Elaine
eha...@georgialibraries.orgwrote:

 First off, I think this is a good enhancement and would be very helpful to
 all users particularly when searching a large database – a keyword search
 for Abraham Lincoln returns over 2400 hits when searching all of PINES…263
 for one of the larger systems. 

 ** **

 Mike – would your algorithm mean that, with a title like Team of Rivals
 (which had an initial high rate of circulation and holds, then interest
 fell off to revive again with the movie), while it is first popular it
 rises in the search results, to fall as interest tapers, and then arises
 again, using older and newer interest data? That way it might “bump” above
 other titles that were popular in that interval more quickly than without
 the algorithm?

 **



That's the intent, yes.

--miker


  **

 *Elaine*
 --


 J. Elaine Hardy
 PINES Bibliographic Projects  Metadata Manager
 Georgia Public Library Service
 1800 Century Place, Ste 150
 Atlanta, Ga. 30345-4304

 404.235-7128
 404.235-7201, fax
 eha...@georgialibraries.org
 www.georgialibraries.org
 www.georgialibraries.org/pines


 

 *From:* open-ils-general-boun...@list.georgialibraries.org [mailto:
 open-ils-general-boun...@list.georgialibraries.org] *On Behalf Of *Mike
 Rylander
 *Sent:* Thursday, March 14, 2013 10:11 PM
 *To:* Evergreen Discussion Group
 *Subject:* Re: [OPEN-ILS-GENERAL] Activity metric for relevance

 ** **

 Kathy,

 ** **

 Have you considered allowing an aging parameter for some bumps, so that
 newer data toward the near end of the horizon is considered more important?
 For instance, spikes in circulation might have a larger short term effect
 on relevance, but over time, while still being factored into relevance,
 would be less important though still considered in the bump logic.  I ask
 because I have a simple algorithm I'm using in another project, to be
 debuted at the conference, that may be portable to this work.

 ** **

 --miker

 ** **

 ** **

 On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org
 wrote:

 Hi all,

 MassLNC is working with our partners at MVLC to develop an activity metric
 (aka popularity metric) that will allow sites to rank more popular items a
 little higher in search results than items that don't see as much activity.
 I've raised this idea on the list before. Although Evergreen allows sites
 to adjust relevancy based on the appearance of keywords in certain fields,
 which is highly useful, our hope is that this additional functionality will
 lead to further improvement when ranking results by relevance.

 As an example, if a user were conducting a keyword search on abraham
 lincoln,  there are many titles in most US libraries where the words
 abraham lincoln show up in the title. There would be no way to tease out
 the titles that are getting the most attention by readers. In fact, a title
 like Team of Rivals ranks very low in our search results even though
 there is a high likelihood it is the title the patron is seeking.  By
 applying a metric based on activity, we might be able to see those
 more-recently popular titles floating higher in the search results list.

 I would like to share MVLC's proposal outlining the details for
 implementing this project. The proposal is available at
 http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in
 allowing sites to define what high activity means to them. Circulation
 activity, holds activity, total copies, and publication age/bib record age
 can all be used as an activity metric.

 If you have any feedback or questions, feel free to let us know.

 Kathy

 --
 Kathy Lussier
 Project Coordinator
 Massachusetts Library Network Cooperative
 (508) 343-0128
 kluss...@masslnc.org
 Twitter: http://www.twitter.com/kmlussier



 

 ** **

 --
 Mike Rylander
  | Director of Research and Development
  | Equinox Software, Inc. / Your Library's Guide to Open Source
  | phone:  1-877-OPEN-ILS (673-6457)
  | email:  mi...@esilibrary.com
  | web:  http://www.esilibrary.com 




-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com


Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-15 Thread Mike Rylander
On Fri, Mar 15, 2013 at 9:01 AM, Thomas Berezansky tsb...@mvlc.org wrote:

 The current plan would not take into account how recent the circs (or
 holds) were, just that they were within a configurable time period of the
 time the cronjob that counts them last ran (default will likely be to
 include those from within the last 6 to 12 months). If you have an
 algorithm you think would work well and are willing to share we would
 gladly include that as an option when doing the work, though.


I do, and I am.  As time permits over the next few weeks I'll get back to
this thread.


 We would not, however, be able to make it a per-bump option with the way
 we currently plan on storing the circ and hold counts, so instead it would
 function as an overall modifier to the circ/hold count numbers. Though even
 as I type this email I have thoughts on how we could change that if the
 feeling is that it should be at least partially bump-to-bump configurable.


I think it's really only useful for some bump types in any case.  The ratio
bumps are really point-in-time values -- they represent right this very
moment (or late last night, I guess). Threshold bumps don't attempt to
take scale into account, just that some line was crossed.  For circs this
year or holds this month, or similar, age scaling (probably a better term
than just aging) of each event's relevance should be useful.

--miker


 Thomas Berezansky
 Merrimack Valley Library Consortium


 Quoting Mike Rylander mrylan...@gmail.com:

  Kathy,

 Have you considered allowing an aging parameter for some bumps, so that
 newer data toward the near end of the horizon is considered more
 important?
 For instance, spikes in circulation might have a larger short term effect
 on relevance, but over time, while still being factored into relevance,
 would be less important though still considered in the bump logic.  I ask
 because I have a simple algorithm I'm using in another project, to be
 debuted at the conference, that may be portable to this work.

 --miker



 On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org
 wrote:

  Hi all,

 MassLNC is working with our partners at MVLC to develop an activity
 metric
 (aka popularity metric) that will allow sites to rank more popular items
 a
 little higher in search results than items that don't see as much
 activity.
 I've raised this idea on the list before. Although Evergreen allows sites
 to adjust relevancy based on the appearance of keywords in certain
 fields,
 which is highly useful, our hope is that this additional functionality
 will
 lead to further improvement when ranking results by relevance.

 As an example, if a user were conducting a keyword search on abraham
 lincoln,  there are many titles in most US libraries where the words
 abraham lincoln show up in the title. There would be no way to tease
 out
 the titles that are getting the most attention by readers. In fact, a
 title
 like Team of Rivals ranks very low in our search results even though
 there is a high likelihood it is the title the patron is seeking.  By
 applying a metric based on activity, we might be able to see those
 more-recently popular titles floating higher in the search results list.

 I would like to share MVLC's proposal outlining the details for
 implementing this project. The proposal is available at
 http://masslnc.cwmars.org/node/2757http://masslnc.cwmars.org/**node/2757
 http://masslnc.**cwmars.org/node/2757http://masslnc.cwmars.org/node/2757
 .

 It provides a lot of flexibility in allowing sites to define what high
 activity means to them. Circulation activity, holds activity, total
 copies, and publication age/bib record age can all be used as an activity
 metric.

 If you have any feedback or questions, feel free to let us know.

 Kathy

 --
 Kathy Lussier
 Project Coordinator
 Massachusetts Library Network Cooperative
 (508) 343-0128
 kluss...@masslnc.org
 Twitter: 
 http://www.twitter.com/kmlussierhttp://www.twitter.com/**kmlussier
 http://www.twitter.**com/kmlussier http://www.twitter.com/kmlussier




 --
 Mike Rylander
  | Director of Research and Development
  | Equinox Software, Inc. / Your Library's Guide to Open Source
  | phone:  1-877-OPEN-ILS (673-6457)
  | email:  mi...@esilibrary.com
  | web:  http://www.esilibrary.com






-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com


[OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-14 Thread Kathy Lussier

Hi all,

MassLNC is working with our partners at MVLC to develop an activity 
metric (aka popularity metric) that will allow sites to rank more 
popular items a little higher in search results than items that don't 
see as much activity. I've raised this idea on the list before. Although 
Evergreen allows sites to adjust relevancy based on the appearance of 
keywords in certain fields, which is highly useful, our hope is that 
this additional functionality will lead to further improvement when 
ranking results by relevance.


As an example, if a user were conducting a keyword search on abraham 
lincoln,  there are many titles in most US libraries where the words 
abraham lincoln show up in the title. There would be no way to tease 
out the titles that are getting the most attention by readers. In fact, 
a title like Team of Rivals ranks very low in our search results even 
though there is a high likelihood it is the title the patron is 
seeking.  By applying a metric based on activity, we might be able to 
see those more-recently popular titles floating higher in the search 
results list.


I would like to share MVLC's proposal outlining the details for 
implementing this project. The proposal is available at 
http://masslnc.cwmars.org/node/2757. It provides a lot of flexibility in 
allowing sites to define what high activity means to them. Circulation 
activity, holds activity, total copies, and publication age/bib record 
age can all be used as an activity metric.


If you have any feedback or questions, feel free to let us know.

Kathy

--
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
kluss...@masslnc.org
Twitter: http://www.twitter.com/kmlussier



Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-14 Thread Rogan Hamby
Am I correct in assuming that the bump amount is essentially the weight
that bump will have when adding up the total effect of the bumps on that
search?


On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote:

 Hi all,

 MassLNC is working with our partners at MVLC to develop an activity metric
 (aka popularity metric) that will allow sites to rank more popular items a
 little higher in search results than items that don't see as much activity.
 I've raised this idea on the list before. Although Evergreen allows sites
 to adjust relevancy based on the appearance of keywords in certain fields,
 which is highly useful, our hope is that this additional functionality will
 lead to further improvement when ranking results by relevance.

 As an example, if a user were conducting a keyword search on abraham
 lincoln,  there are many titles in most US libraries where the words
 abraham lincoln show up in the title. There would be no way to tease out
 the titles that are getting the most attention by readers. In fact, a title
 like Team of Rivals ranks very low in our search results even though
 there is a high likelihood it is the title the patron is seeking.  By
 applying a metric based on activity, we might be able to see those
 more-recently popular titles floating higher in the search results list.

 I would like to share MVLC's proposal outlining the details for
 implementing this project. The proposal is available at
 http://masslnc.cwmars.org/**node/2757http://masslnc.cwmars.org/node/2757.
 It provides a lot of flexibility in allowing sites to define what high
 activity means to them. Circulation activity, holds activity, total
 copies, and publication age/bib record age can all be used as an activity
 metric.

 If you have any feedback or questions, feel free to let us know.

 Kathy

 --
 Kathy Lussier
 Project Coordinator
 Massachusetts Library Network Cooperative
 (508) 343-0128
 kluss...@masslnc.org
 Twitter: http://www.twitter.com/**kmlussierhttp://www.twitter.com/kmlussier




-- 

Rogan Hamby, MLS, CCNP, MIA
Managers Headquarters Library and Reference Services,
York County Library System

You can never get a cup of tea large enough or a book long enough to suit
me.
-- C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis


Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-14 Thread Kathy Lussier

Hi Rogan,

Yes, that's right. According to Thomas Berezansky, if you have other 
relevancy bumps active from search.relevance_adjustment, they will be 
combined, so it won't just be the activity metrics.


Kathy

Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
kluss...@masslnc.org
Twitter: http://www.twitter.com/kmlussier

On 3/14/2013 4:02 PM, Rogan Hamby wrote:
Am I correct in assuming that the bump amount is essentially the 
weight that bump will have when adding up the total effect of the 
bumps on that search?



On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org 
mailto:kluss...@masslnc.org wrote:


Hi all,

MassLNC is working with our partners at MVLC to develop an
activity metric (aka popularity metric) that will allow sites to
rank more popular items a little higher in search results than
items that don't see as much activity. I've raised this idea on
the list before. Although Evergreen allows sites to adjust
relevancy based on the appearance of keywords in certain fields,
which is highly useful, our hope is that this additional
functionality will lead to further improvement when ranking
results by relevance.

As an example, if a user were conducting a keyword search on
abraham lincoln,  there are many titles in most US libraries
where the words abraham lincoln show up in the title. There
would be no way to tease out the titles that are getting the most
attention by readers. In fact, a title like Team of Rivals ranks
very low in our search results even though there is a high
likelihood it is the title the patron is seeking.  By applying a
metric based on activity, we might be able to see those
more-recently popular titles floating higher in the search results
list.

I would like to share MVLC's proposal outlining the details for
implementing this project. The proposal is available at
http://masslnc.cwmars.org/node/2757. It provides a lot of
flexibility in allowing sites to define what high activity means
to them. Circulation activity, holds activity, total copies, and
publication age/bib record age can all be used as an activity metric.

If you have any feedback or questions, feel free to let us know.

Kathy

-- 
Kathy Lussier

Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128 tel:%28508%29%20343-0128
kluss...@masslnc.org mailto:kluss...@masslnc.org
Twitter: http://www.twitter.com/kmlussier




--

Rogan Hamby, MLS, CCNP, MIA
Managers Headquarters Library and Reference Services,
York County Library System

You can never get a cup of tea large enough or a book long enough to 
suit me.

-- C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis




Re: [OPEN-ILS-GENERAL] Activity metric for relevance

2013-03-14 Thread Mike Rylander
Kathy,

Have you considered allowing an aging parameter for some bumps, so that
newer data toward the near end of the horizon is considered more important?
For instance, spikes in circulation might have a larger short term effect
on relevance, but over time, while still being factored into relevance,
would be less important though still considered in the bump logic.  I ask
because I have a simple algorithm I'm using in another project, to be
debuted at the conference, that may be portable to this work.

--miker



On Thu, Mar 14, 2013 at 3:53 PM, Kathy Lussier kluss...@masslnc.org wrote:

 Hi all,

 MassLNC is working with our partners at MVLC to develop an activity metric
 (aka popularity metric) that will allow sites to rank more popular items a
 little higher in search results than items that don't see as much activity.
 I've raised this idea on the list before. Although Evergreen allows sites
 to adjust relevancy based on the appearance of keywords in certain fields,
 which is highly useful, our hope is that this additional functionality will
 lead to further improvement when ranking results by relevance.

 As an example, if a user were conducting a keyword search on abraham
 lincoln,  there are many titles in most US libraries where the words
 abraham lincoln show up in the title. There would be no way to tease out
 the titles that are getting the most attention by readers. In fact, a title
 like Team of Rivals ranks very low in our search results even though
 there is a high likelihood it is the title the patron is seeking.  By
 applying a metric based on activity, we might be able to see those
 more-recently popular titles floating higher in the search results list.

 I would like to share MVLC's proposal outlining the details for
 implementing this project. The proposal is available at
 http://masslnc.cwmars.org/**node/2757http://masslnc.cwmars.org/node/2757.
 It provides a lot of flexibility in allowing sites to define what high
 activity means to them. Circulation activity, holds activity, total
 copies, and publication age/bib record age can all be used as an activity
 metric.

 If you have any feedback or questions, feel free to let us know.

 Kathy

 --
 Kathy Lussier
 Project Coordinator
 Massachusetts Library Network Cooperative
 (508) 343-0128
 kluss...@masslnc.org
 Twitter: http://www.twitter.com/**kmlussierhttp://www.twitter.com/kmlussier




-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com