Re: [CODE4LIB] Anyone have a SUSHI client?
Tom, I am the guy who wrote sushi.py around this time last year. My apologies for the shabbiness of the code. It was meant to be primarily a proof of concept. It's definitely incomplete. I only completed the DB3 and JR1 report logic up to this point, but it would be easy enough to add other report types. You're also right that sushi.py doesn't do anything to dedupe data, but it would be very simple to write a script that reads through the SQL records and deletes dupes. You could also use the built-in UNIQUE flag in MySQL when creating your table so that duplicate records just don't get saved. If you use the CSV export functionality of sushi.py, Excel has some built-in dedupe features that would help as well. Let me know if you'd like some help modifying sushi.py. I sort of gave up on it last spring. SUSHI implementation among vendors is still pretty shabby, and there are still some weaknesses in the SUSHI standard (I wrote about them in the Nov 2012 issue of Computers in Libraries). The productivity gains I was seeing from using SUSHI ended up being pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom Keays Sent: Friday, January 25, 2013 8:40 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Anyone have a SUSHI client? I've been looking briefly at sushi.py, as a way to orient myself to collecting stats this way. I'm not intending to single out sushi.py, but looking at it (mainly the data structure at this point, and not the code itself), raises some questions about the best approach for collecting SUSHI data. sushi.py seems to have a small number of routines; mainly to retrieve the XML file from a vendor and ingest the data in that file into a MySQL database. There are only MySQL tables for COUNTER JR1, DR1, DR2, and DR2 reports and they mirror, to a degree, the structure of the item records returned in the SUSHI xml. Here are the skeletons of 2 of the sushi.py SQL tables: counter_jr1 id int, print_issn varchar, online_issn varchar, platform varchar, item_name text, data_type varchar, date_begin datetime, date_end datetime, ft_pdf int, ft_html int, ft_total varchar counter_db3 id int, platform varchar, item_name text, data_type varchar, date_begin datetime, date_end datetime, searches int, sessions int On the face of it, this seems like a pretty good data structure (although I have a couple of concerns, that I will get to) but my main question is whether there is any agreement about how to collect this data? If I were to dig into some of the other SUSHI packages mentioned in this thread, what would I find there? Excel-formatted COUNTER reports are simply a table of columns, representing various fields, such as title (for JR1), platform, publisher (for JR1), ISSN (for JR1), etc., followed by columns for up to 12 months of the collected year, and then summary data. JR1 reports have fulltext HTML, PDF, and Total columns. DR1 has two rows, one for searches and one for sesssions, with YTD totals in the final column. Similar data structures exist for other COUNTER reports. They rely on the user to interpret them and probably ought not to inform a decision for structuring the data in a database. Is there been any best practice for how COUNTER data is modeled in a database? There are other COUNTER reports besides those four. For instance, some journal vendors do indeed report searches and sessions using the DR3 report, but others use the equivalent JR4 report, so I would have expected sushi.py to have a mechanism to collect these. Does SUSHI only deliver JR1, DR1, DR2, and DR2 reports, or is this a problem with sushi.py? Now, one of the selling points for SUSHI is that if a vendor ever advises that you should re-collect data for a given time period, the xml you receive is structured such that the act of collecting OUGHT TO update, rather than duplicate, data previously collected. However in sushi.py's SQL structure, which gives every row a unique (auto-incremented) ID number, there would have to be logic applied during the ingest to prevent multiple instances of data collected from the same vendor for the same time period. So, that's a concern. I'm also concerned about what is represented in the ft_pdf, ft_html, and ft_total fields. In the Excel COUNTER reports, the ft_pdf, ft_html, and ft_total columns simply tabulate the YTD totals and the only way you would be able to derive a monthly breakdown would be to collect 12 monthly reports and analyze the differences from month to month -- something that most libraries don't do. I have to go back and confirm this, but I don't think the SUSHI reports are giving a month-only breakdown for those fields, so I wonder about their inclusion in that table. I guess my question is what is returned in the SUSHI xml report: monthly or yearly figures for the ft_pdf, ft_html, and ft_total fields
Re: [CODE4LIB] Anyone have a SUSHI client?
Hi Joshua, I was mainly looking at your program, not for the code, but as a way to bring myself up to speed about current practices in modeling the COUNTER data. I'm trying to avoid reinventing something that has already been well thought through. I apologize for calling out your model. You have gotten much further than I have. Some of the other respondents in this thread have set me straight on some things I was very fuzzy on going in. How go about I collecting and storing the data is still something I haven't resolved yet. I personally would prefer a Python solution, but there forces here at MPOW that suggest I should build a data repository in SharePoint. Assuming that is the case, Serial Solution's open source SUSHI harvester written in .NET might actually be the way for me to go. So, my next step is to look at their data model and see what reports they collect and store. As an aside, I'm also now wondering if de-duping is strictly necessary as long as there is a field to record the date the report was generated. De-duping (or maybe just deprecating duplicate data) could be separate from the collection process. Best, Tom On Mon, Feb 4, 2013 at 10:07 AM, Joshua Welker jwel...@sbuniv.edu wrote: Tom, I am the guy who wrote sushi.py around this time last year. My apologies for the shabbiness of the code. It was meant to be primarily a proof of concept. It's definitely incomplete. I only completed the DB3 and JR1 report logic up to this point, but it would be easy enough to add other report types. You're also right that sushi.py doesn't do anything to dedupe data, but it would be very simple to write a script that reads through the SQL records and deletes dupes. You could also use the built-in UNIQUE flag in MySQL when creating your table so that duplicate records just don't get saved. If you use the CSV export functionality of sushi.py, Excel has some built-in dedupe features that would help as well. Let me know if you'd like some help modifying sushi.py. I sort of gave up on it last spring. SUSHI implementation among vendors is still pretty shabby, and there are still some weaknesses in the SUSHI standard (I wrote about them in the Nov 2012 issue of Computers in Libraries). The productivity gains I was seeing from using SUSHI ended up being pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom Keays Sent: Friday, January 25, 2013 8:40 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Anyone have a SUSHI client? I've been looking briefly at sushi.py, as a way to orient myself to collecting stats this way. I'm not intending to single out sushi.py, but looking at it (mainly the data structure at this point, and not the code itself), raises some questions about the best approach for collecting SUSHI data. sushi.py seems to have a small number of routines; mainly to retrieve the XML file from a vendor and ingest the data in that file into a MySQL database. There are only MySQL tables for COUNTER JR1, DR1, DR2, and DR2 reports and they mirror, to a degree, the structure of the item records returned in the SUSHI xml. Here are the skeletons of 2 of the sushi.py SQL tables: counter_jr1 id int, print_issn varchar, online_issn varchar, platform varchar, item_name text, data_type varchar, date_begin datetime, date_end datetime, ft_pdf int, ft_html int, ft_total varchar counter_db3 id int, platform varchar, item_name text, data_type varchar, date_begin datetime, date_end datetime, searches int, sessions int On the face of it, this seems like a pretty good data structure (although I have a couple of concerns, that I will get to) but my main question is whether there is any agreement about how to collect this data? If I were to dig into some of the other SUSHI packages mentioned in this thread, what would I find there? Excel-formatted COUNTER reports are simply a table of columns, representing various fields, such as title (for JR1), platform, publisher (for JR1), ISSN (for JR1), etc., followed by columns for up to 12 months of the collected year, and then summary data. JR1 reports have fulltext HTML, PDF, and Total columns. DR1 has two rows, one for searches and one for sesssions, with YTD totals in the final column. Similar data structures exist for other COUNTER reports. They rely on the user to interpret them and probably ought not to inform a decision for structuring the data in a database. Is there been any best practice for how COUNTER data is modeled in a database? There are other COUNTER reports besides those four. For instance, some journal vendors do indeed report searches and sessions using the DR3 report, but others use the equivalent JR4 report, so I would have expected sushi.py to have a mechanism to collect these. Does SUSHI only deliver JR1, DR1, DR2, and DR2 reports, or is this a problem
Re: [CODE4LIB] Anyone have a SUSHI client?
Tom, I'm glad you are getting some help from looking at my code. That was the reason I put it up. When I was designing a SUSHI client last year, there was virtually nothing available to show how to do it, so I was hoping sushi.py would prevent someone from having to stumble in the dark quite as much. Funny you mention your organization wanting you to use Sharepoint. We also originally used Sharepoint for storing some of our operational data, but in the end, Sharepoint is such a closed environment that it wasn't able to meet our needs. However, even if you are 100% locked into Sharepoint, you should still be able to use Python code to access the SUSHI web services and then store it in Sharepoint using some of its APIs (if you can wade through Sharepoint's docs enough to figure out those APIs). Are you familiar with XAMPP? It's a light-weight Windows package that lets you run MySQL and some other stuff on a desktop computer very easily. Even if you don't have access to an official web server from your institution, you could still run sushi.py just on your local computer and have it store the data in your local MySQL database. Then you could just create some backups of that database and store them as files on Sharepoint or a shared drive or whatever method you have available. Just a thought. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom Keays Sent: Monday, February 04, 2013 9:39 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Anyone have a SUSHI client? Hi Joshua, I was mainly looking at your program, not for the code, but as a way to bring myself up to speed about current practices in modeling the COUNTER data. I'm trying to avoid reinventing something that has already been well thought through. I apologize for calling out your model. You have gotten much further than I have. Some of the other respondents in this thread have set me straight on some things I was very fuzzy on going in. How go about I collecting and storing the data is still something I haven't resolved yet. I personally would prefer a Python solution, but there forces here at MPOW that suggest I should build a data repository in SharePoint. Assuming that is the case, Serial Solution's open source SUSHI harvester written in .NET might actually be the way for me to go. So, my next step is to look at their data model and see what reports they collect and store. As an aside, I'm also now wondering if de-duping is strictly necessary as long as there is a field to record the date the report was generated. De-duping (or maybe just deprecating duplicate data) could be separate from the collection process. Best, Tom On Mon, Feb 4, 2013 at 10:07 AM, Joshua Welker jwel...@sbuniv.edu wrote: Tom, I am the guy who wrote sushi.py around this time last year. My apologies for the shabbiness of the code. It was meant to be primarily a proof of concept. It's definitely incomplete. I only completed the DB3 and JR1 report logic up to this point, but it would be easy enough to add other report types. You're also right that sushi.py doesn't do anything to dedupe data, but it would be very simple to write a script that reads through the SQL records and deletes dupes. You could also use the built-in UNIQUE flag in MySQL when creating your table so that duplicate records just don't get saved. If you use the CSV export functionality of sushi.py, Excel has some built-in dedupe features that would help as well. Let me know if you'd like some help modifying sushi.py. I sort of gave up on it last spring. SUSHI implementation among vendors is still pretty shabby, and there are still some weaknesses in the SUSHI standard (I wrote about them in the Nov 2012 issue of Computers in Libraries). The productivity gains I was seeing from using SUSHI ended up being pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom Keays Sent: Friday, January 25, 2013 8:40 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Anyone have a SUSHI client? I've been looking briefly at sushi.py, as a way to orient myself to collecting stats this way. I'm not intending to single out sushi.py, but looking at it (mainly the data structure at this point, and not the code itself), raises some questions about the best approach for collecting SUSHI data. sushi.py seems to have a small number of routines; mainly to retrieve the XML file from a vendor and ingest the data in that file into a MySQL database. There are only MySQL tables for COUNTER JR1, DR1, DR2, and DR2 reports and they mirror, to a degree, the structure of the item records returned in the SUSHI xml. Here are the skeletons of 2 of the sushi.py SQL tables: counter_jr1 id int, print_issn varchar, online_issn varchar, platform varchar, item_name text, data_type varchar
Re: [CODE4LIB] Anyone have a SUSHI client?
I've been looking briefly at sushi.py, as a way to orient myself to collecting stats this way. I'm not intending to single out sushi.py, but looking at it (mainly the data structure at this point, and not the code itself), raises some questions about the best approach for collecting SUSHI data. sushi.py seems to have a small number of routines; mainly to retrieve the XML file from a vendor and ingest the data in that file into a MySQL database. There are only MySQL tables for COUNTER JR1, DR1, DR2, and DR2 reports and they mirror, to a degree, the structure of the item records returned in the SUSHI xml. Here are the skeletons of 2 of the sushi.py SQL tables: counter_jr1 id int, print_issn varchar, online_issn varchar, platform varchar, item_name text, data_type varchar, date_begin datetime, date_end datetime, ft_pdf int, ft_html int, ft_total varchar counter_db3 id int, platform varchar, item_name text, data_type varchar, date_begin datetime, date_end datetime, searches int, sessions int On the face of it, this seems like a pretty good data structure (although I have a couple of concerns, that I will get to) but my main question is whether there is any agreement about how to collect this data? If I were to dig into some of the other SUSHI packages mentioned in this thread, what would I find there? Excel-formatted COUNTER reports are simply a table of columns, representing various fields, such as title (for JR1), platform, publisher (for JR1), ISSN (for JR1), etc., followed by columns for up to 12 months of the collected year, and then summary data. JR1 reports have fulltext HTML, PDF, and Total columns. DR1 has two rows, one for searches and one for sesssions, with YTD totals in the final column. Similar data structures exist for other COUNTER reports. They rely on the user to interpret them and probably ought not to inform a decision for structuring the data in a database. Is there been any best practice for how COUNTER data is modeled in a database? There are other COUNTER reports besides those four. For instance, some journal vendors do indeed report searches and sessions using the DR3 report, but others use the equivalent JR4 report, so I would have expected sushi.py to have a mechanism to collect these. Does SUSHI only deliver JR1, DR1, DR2, and DR2 reports, or is this a problem with sushi.py? Now, one of the selling points for SUSHI is that if a vendor ever advises that you should re-collect data for a given time period, the xml you receive is structured such that the act of collecting OUGHT TO update, rather than duplicate, data previously collected. However in sushi.py's SQL structure, which gives every row a unique (auto-incremented) ID number, there would have to be logic applied during the ingest to prevent multiple instances of data collected from the same vendor for the same time period. So, that's a concern. I'm also concerned about what is represented in the ft_pdf, ft_html, and ft_total fields. In the Excel COUNTER reports, the ft_pdf, ft_html, and ft_total columns simply tabulate the YTD totals and the only way you would be able to derive a monthly breakdown would be to collect 12 monthly reports and analyze the differences from month to month -- something that most libraries don't do. I have to go back and confirm this, but I don't think the SUSHI reports are giving a month-only breakdown for those fields, so I wonder about their inclusion in that table. I guess my question is what is returned in the SUSHI xml report: monthly or yearly figures for the ft_pdf, ft_html, and ft_total fields? Tom
Re: [CODE4LIB] Anyone have a SUSHI client?
Hi Bill, There's a lightweight python client: http://sourceforge.net/projects/sushipy/ (I haven't used it, just know *of* it) Thanks, James James Van Mil Collections Electronic Resources Librarian University of Cincinnati Libraries Telephone: (513)556-1410 vanmi...@ucmail.uc.edu -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill Dueber Sent: Wednesday, January 23, 2013 5:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Anyone have a SUSHI client? [Background: SUSHI http://www.niso.org/committees/SUSHI/SUSHI_comm.htmlis a SOAP protocol for getting data on use of electronic resources in the COUNTER format] I'm just starting to look at trying to get COUNTER data via SUSHI into our data warehouse, and I'm discovering that apparently no one has worked on a SUSHI client since late 2009. UnlessI'm missing one? Anyone out there using SUSHI and have a client that works and is up-to-date and has some documentation of some sort? I'd prefer ruby or java, but will take anything that'll run under linux (i.e., not C#) at this point. I'm desperately trying not to have to deal with the raw SOAP and parsing the XML and such, so any help would be appreciated. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Anyone have a SUSHI client?
The one I know of is http://code.google.com/p/sushicounterclient/ which is offered by Serial Solutions. It's a .NET framework. On Thu, Jan 24, 2013 at 8:28 AM, Van Mil, James (vanmiljf) vanmi...@ucmail.uc.edu wrote: Hi Bill, There's a lightweight python client: http://sourceforge.net/projects/sushipy/ (I haven't used it, just know *of* it) Thanks, James James Van Mil Collections Electronic Resources Librarian University of Cincinnati Libraries Telephone: (513)556-1410 vanmi...@ucmail.uc.edu -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill Dueber Sent: Wednesday, January 23, 2013 5:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Anyone have a SUSHI client? [Background: SUSHI http://www.niso.org/committees/SUSHI/SUSHI_comm.htmlis a SOAP protocol for getting data on use of electronic resources in the COUNTER format] I'm just starting to look at trying to get COUNTER data via SUSHI into our data warehouse, and I'm discovering that apparently no one has worked on a SUSHI client since late 2009. UnlessI'm missing one? Anyone out there using SUSHI and have a client that works and is up-to-date and has some documentation of some sort? I'd prefer ruby or java, but will take anything that'll run under linux (i.e., not C#) at this point. I'm desperately trying not to have to deal with the raw SOAP and parsing the XML and such, so any help would be appreciated. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Anyone have a SUSHI client?
Hey. NISO has a list of SUSHI tools. http://www.niso.org/workrooms/sushi/tools/ Tom
Re: [CODE4LIB] Anyone have a SUSHI client?
From the NISO list, JISC's SUSHI Starter, written in PHP, looks pretty good. http://cclibweb-4.dmz.cranfield.ac.uk/projects/sushistarters/ On Thu, Jan 24, 2013 at 9:10 AM, Tom Keays tomke...@gmail.com wrote: Hey. NISO has a list of SUSHI tools. http://www.niso.org/workrooms/sushi/tools/ Tom
Re: [CODE4LIB] Anyone have a SUSHI client?
We also have a developers listserv, if you run into any challenges: http://www.niso.org/lists/sushidevelopers/ (I'm on the SUSHI maintenance committee) Thanks, James James Van Mil Collections Electronic Resources Librarian University of Cincinnati Libraries Telephone: (513)556-1410 vanmi...@ucmail.uc.edu
Re: [CODE4LIB] Anyone have a SUSHI client?
Yeah -- I found that right away. Most of what's there appears to be abandonware. On Thu, Jan 24, 2013 at 9:10 AM, Tom Keays tomke...@gmail.com wrote: Hey. NISO has a list of SUSHI tools. http://www.niso.org/workrooms/sushi/tools/ Tom -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Anyone have a SUSHI client?
It looks like the Metridoc project might have one: https://code.google.com/p/metridoc/source/search?q=sushiorigq=sushibtnG=Search+Trunk No idea if it's working, but I'd be really interested in hearing an update on Metridoc - if Thomas or anyone else involved is listening. Jason Stirnaman Digital Projects Librarian A.R. Dykes Library University of Kansas Medical Center 913-588-7319 From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Bill Dueber [b...@dueber.com] Sent: Thursday, January 24, 2013 9:36 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Anyone have a SUSHI client? Yeah -- I found that right away. Most of what's there appears to be abandonware. On Thu, Jan 24, 2013 at 9:10 AM, Tom Keays tomke...@gmail.com wrote: Hey. NISO has a list of SUSHI tools. http://www.niso.org/workrooms/sushi/tools/ Tom -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Anyone have a SUSHI client?
Hi Jason et al, No idea if Tommy is about but we've sort of started using Metridoc at NCSU. I'm serious about that sort of ... It's kind of morphed in the last little while to become a set of Grails plugins, and I spent a lot of time fighting with Grails (although I see what they're going for and I like that idea -- think something like Rails, but based on Groovy/the JVM). I can say that not all of the plugins (each export tool is packaged as a plugin, roughly) are in really good shape, I think the sushi component has not been updated in a while. There is a google group (https://groups.google.com/forum/#!forum/metridoc) with ... well, not much activity, but I expect it to pick up in the not too distant future: there are some conversations among libraries on the OLE project to see about using it as a data warehousing tool therefor. cheers, AC On Thu, Jan 24, 2013 at 2:11 PM, Jason Stirnaman jstirna...@kumc.eduwrote: It looks like the Metridoc project might have one: https://code.google.com/p/metridoc/source/search?q=sushiorigq=sushibtnG=Search+Trunk No idea if it's working, but I'd be really interested in hearing an update on Metridoc - if Thomas or anyone else involved is listening. Jason Stirnaman Digital Projects Librarian A.R. Dykes Library University of Kansas Medical Center 913-588-7319 From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Bill Dueber [b...@dueber.com] Sent: Thursday, January 24, 2013 9:36 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Anyone have a SUSHI client? Yeah -- I found that right away. Most of what's there appears to be abandonware. On Thu, Jan 24, 2013 at 9:10 AM, Tom Keays tomke...@gmail.com wrote: Hey. NISO has a list of SUSHI tools. http://www.niso.org/workrooms/sushi/tools/ Tom -- Bill Dueber Library Systems Programmer University of Michigan Library
[CODE4LIB] Anyone have a SUSHI client?
[Background: SUSHI http://www.niso.org/committees/SUSHI/SUSHI_comm.htmlis a SOAP protocol for getting data on use of electronic resources in the COUNTER format] I'm just starting to look at trying to get COUNTER data via SUSHI into our data warehouse, and I'm discovering that apparently no one has worked on a SUSHI client since late 2009. UnlessI'm missing one? Anyone out there using SUSHI and have a client that works and is up-to-date and has some documentation of some sort? I'd prefer ruby or java, but will take anything that'll run under linux (i.e., not C#) at this point. I'm desperately trying not to have to deal with the raw SOAP and parsing the XML and such, so any help would be appreciated. -- Bill Dueber Library Systems Programmer University of Michigan Library