from:"Cindy Harper"

Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-06 Thread Cindy Harper

I was going to comment that some of the Encore shortcomings mentioned in
the PDf do seem to be addressed in current Encore versions, although some
of these issues have to be addressed - for instance, there is a
spell-check, but it can give some surprising suggestions, though
suggestions do clue the user in to the fact that they might have a
misspelling/typo.

III's reaction to studies that report that users ignore the right-side
panel of search options was to provide a skin that has only two columns -
the facets on the left, and the search results on the middle-to-right.
This pushes important facets like the tag cloud very far down the page, and
causes a lot of scrolling, so I don't like this skin much.

I recently asked a question on the encore users' list about how the tag
cloud could be improved - currently it suggests the most common subfield a
of the subject headings.  I would think it should include the general,
chronological, geographical subdivisions - subfields x,y,z.  For instance,
it doesn't provide good suggestions for improving the search civil war
without these. A chronological subdivision would help a lot there.  But
then again, I haven't seen a prototype of how many relevant subdivisions
this would result in - would the subdivisions drown out the main headings
in the tag cloud?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Wed, Sep 5, 2012 at 5:30 PM, Jonathan LeBreton lebre...@temple.eduwrote:

 Lucy Holman, Director of the U Baltimore Library, and a former colleague
 of mine at UMBC,  got back to me about this.  Her reply puts this
 particular document into context.   It is an interesting reminder that not
 everything you find on the web is as it seems, and it certainly is not
 necessarily the final word.   We gotta go buy the book!
 Lucy is off-list, but asked me to post this on her behalf.
 Her contact information is below, though

 Very interesting discussion This issue of what is right and feasible in
 discovery services and how to configure it is important stuff for many of
 our libraries and we should be able to build on the findings and
 experiences of others rather than re-inventing the wheel locally   (We
 use Summon)

 - Jonathan LeBreton


   begin Lucy's explanation  --

 The full study and analysis are included in Chapter 14 of a new book,
 Planning and Implementing Resource Discovery Tools in Academic Libraries,
 Mary P. Popp and Diane Dallis (Eds).

 The project was part of a graduate Research Methods course in the
 University of Baltimore's MS in Interaction Design and Information
 Architecture program.  Originally groups within the course conducted
 task-based usability tests on EDS, Primo, Summon and Encore.
  Unfortunately, the test environment of Encore led to many usability issues
 that we believed were more a result of the test environment than the
 product itself; therefore we did not report on Encore in the final
 analysis.  The study (and chapter) does offers findings on the other three
 discovery tools.

 There were six student groups in the course; each group studied two tools
 with the same user population (undergrad, graduate and faculty) so that
 each tool was compared against the other three with each user population
 overall.  The .pdf that you found was the final report of one of those six
 groups, so it only addresses two of the four tools.  The chapter is the
 only document that pulls the six portions of the study together.

 I would be happy to discuss this with any of you individually if you need
 more information.

 Thanks for your interest in the study.


 Lucy Holman, DCD
 Director, Langsdale Library
 University of Baltimore
 1420 Maryland Avenue
 Baltimore, MD  21201
 410-837-4333

 -  end insert 

 Jonathan LeBreton
 Sr. Associate University Librarian
 Temple University Libraries
 Paley M138,  1210 Polett Walk, Philadelphia PA 19122
 voice: 215-204-8231
 fax: 215-204-5201
 mobile: 215-284-5070
 email:  lebre...@temple.edu
 email:  jonat...@temple.edu


  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  karim boughida
  Sent: Tuesday, September 04, 2012 5:09 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] U of Baltimore, Final Usability Report, link
 resolvers --
  MIA?
 
  Hi Tom,
  Top players are EDS, Primo and Summonthe only reason I see encore in
 the
  mix is if you have other III products which is not the case of Ubalt
 library. They
  have now worldcat? Encore vs Summon is an easy win for summon.
 
  Let's wait for Jonathan LeBreton (Thanks BTW).
 
  Karim Boughida
 
  On Tue, Sep 4, 2012 at 4:26 PM, Tom Pasley tom.pas...@gmail.com wrote:
   Yes, I'm curious to know too! Due to database/resource matching or
   coverage perhaps (anyone's guess).
  
   Tom
  
   On Wed, Sep 5, 2012 at 7:50 AM, karim boughida kbough...@gmail.com

Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-06 Thread Cindy Harper

I meant to say some of these issues have to be addressed in configuration

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Thu, Sep 6, 2012 at 9:06 AM, Cindy Harper char...@colgate.edu wrote:

 I was going to comment that some of the Encore shortcomings mentioned in
 the PDf do seem to be addressed in current Encore versions, although some
 of these issues have to be addressed - for instance, there is a
 spell-check, but it can give some surprising suggestions, though
 suggestions do clue the user in to the fact that they might have a
 misspelling/typo.

 III's reaction to studies that report that users ignore the right-side
 panel of search options was to provide a skin that has only two columns -
 the facets on the left, and the search results on the middle-to-right.
 This pushes important facets like the tag cloud very far down the page, and
 causes a lot of scrolling, so I don't like this skin much.

 I recently asked a question on the encore users' list about how the tag
 cloud could be improved - currently it suggests the most common subfield a
 of the subject headings.  I would think it should include the general,
 chronological, geographical subdivisions - subfields x,y,z.  For instance,
 it doesn't provide good suggestions for improving the search civil war
 without these. A chronological subdivision would help a lot there.  But
 then again, I haven't seen a prototype of how many relevant subdivisions
 this would result in - would the subdivisions drown out the main headings
 in the tag cloud?

 Cindy Harper, Systems Librarian
 Colgate University Libraries
 char...@colgate.edu
 315-228-7363



 On Wed, Sep 5, 2012 at 5:30 PM, Jonathan LeBreton lebre...@temple.eduwrote:

 Lucy Holman, Director of the U Baltimore Library, and a former colleague
 of mine at UMBC,  got back to me about this.  Her reply puts this
 particular document into context.   It is an interesting reminder that not
 everything you find on the web is as it seems, and it certainly is not
 necessarily the final word.   We gotta go buy the book!
 Lucy is off-list, but asked me to post this on her behalf.
 Her contact information is below, though

 Very interesting discussion This issue of what is right and feasible in
 discovery services and how to configure it is important stuff for many of
 our libraries and we should be able to build on the findings and
 experiences of others rather than re-inventing the wheel locally   (We
 use Summon)

 - Jonathan LeBreton


   begin Lucy's explanation  --

 The full study and analysis are included in Chapter 14 of a new book,
 Planning and Implementing Resource Discovery Tools in Academic Libraries,
 Mary P. Popp and Diane Dallis (Eds).

 The project was part of a graduate Research Methods course in the
 University of Baltimore's MS in Interaction Design and Information
 Architecture program.  Originally groups within the course conducted
 task-based usability tests on EDS, Primo, Summon and Encore.
  Unfortunately, the test environment of Encore led to many usability issues
 that we believed were more a result of the test environment than the
 product itself; therefore we did not report on Encore in the final
 analysis.  The study (and chapter) does offers findings on the other three
 discovery tools.

 There were six student groups in the course; each group studied two tools
 with the same user population (undergrad, graduate and faculty) so that
 each tool was compared against the other three with each user population
 overall.  The .pdf that you found was the final report of one of those six
 groups, so it only addresses two of the four tools.  The chapter is the
 only document that pulls the six portions of the study together.

 I would be happy to discuss this with any of you individually if you need
 more information.

 Thanks for your interest in the study.


 Lucy Holman, DCD
 Director, Langsdale Library
 University of Baltimore
 1420 Maryland Avenue
 Baltimore, MD  21201
 410-837-4333

 -  end insert 

 Jonathan LeBreton
 Sr. Associate University Librarian
 Temple University Libraries
 Paley M138,  1210 Polett Walk, Philadelphia PA 19122
 voice: 215-204-8231
 fax: 215-204-5201
 mobile: 215-284-5070
 email:  lebre...@temple.edu
 email:  jonat...@temple.edu


  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  karim boughida
  Sent: Tuesday, September 04, 2012 5:09 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] U of Baltimore, Final Usability Report, link
 resolvers --
  MIA?
 
  Hi Tom,
  Top players are EDS, Primo and Summonthe only reason I see encore
 in the
  mix is if you have other III products which is not the case of Ubalt
 library. They
  have now worldcat? Encore vs Summon is an easy win for summon.
 
  Let's wait for Jonathan LeBreton (Thanks BTW).
 
  Karim Boughida

[CODE4LIB] Fwd: [LIBR-FAC] bad news about ERIC documents

2012-08-20 Thread Cindy Harper

Sarah Park - This is from the message our govdocs librarian sent out last
week...

Cindy Harper, Systems Librarian, Colgate

This bad news about full text ERIC documents came through govdoc-l today:
The full text documents for ERIC have been temporarily disabled due to a
privacy concern.
We apologize for the inconvenience and are currently working to isolate the
affected documents and return full text access to users as quickly as
possible.
Please stay tuned to eric.ed.gov for an update on when they will become
available again. 
Not sure what the privacy concern is

- Mary Jane Walsh, Head of Government Documents, Maps, Microforms and
Interim Head of Reference
Colgate University
mwalsh at colgate dot edu

[CODE4LIB] Gadgeteers

2012-05-02 Thread Cindy Harper

I didn't know there were so many gadgeteers on this list. My latest item on
my wishlist is this http://wimm.com/.  Now, I'm not a smartphone user,
because I'm always losing my cellphone, and I can't justify the cost of a
data plan.  And I've looked into a wearable notepad, but I think the
shoulder-holster will not give quite the right message. But my ideal watch
device would have the time, alarms and calendars synced to my Google
calendar, and a voice recorder for voice memos to my absent-minded self.  I
think, with the right Android programming, this device could do it.  Anyone
seen one of these?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] CAS authentication with ILLiad

2012-01-12 Thread Cindy Harper

How opportune! Colgate wants to do this, but I'm offered a one-week
timeframe. We have CAS all set up. Does it look like it's doable in that
time?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Thu, Jan 12, 2012 at 12:51 PM, Friscia, Michael michael.fris...@yale.edu
 wrote:

 Anyone still interested in the topic of remote authentication for ILLiad
 using CAS? (for sites that host their own ILLiad instance)

 I just completed integration this morning without using the various UofA
 or UC Davis ISAPI filters out there. If there's interest I'd be happy to
 share how it was done.

 ___
 Michael Friscia
 Manager, Digital Library  Programming Services

 Yale University Library
 (203) 432-1856

Re: [CODE4LIB] CAS authentication with ILLiad

2012-01-12 Thread Cindy Harper

We're running 2008 w/ IIS7.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Thu, Jan 12, 2012 at 1:11 PM, Friscia, Michael
michael.fris...@yale.eduwrote:

 Took me 4 hours start to finish, 10 minutes to make it work, 3 hours 50
 minutes to convert 24k user accounts to work with it. So yes, I think it is
 doable. I'll see what I can put together for documentation. It will be
 written assuming using Windows server 2008 with IIS7. It can be done with
 IIS6 on Server 2003 but would require someone that knows both pretty well.

 ___
 Michael Friscia
 Manager, Digital Library  Programming Services

 Yale University Library
 (203) 432-1856

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Cindy Harper
 Sent: Thursday, January 12, 2012 1:04 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] CAS authentication with ILLiad

 How opportune! Colgate wants to do this, but I'm offered a one-week
 timeframe. We have CAS all set up. Does it look like it's doable in that
 time?

 Cindy Harper, Systems Librarian
 Colgate University Libraries
 char...@colgate.edu
 315-228-7363



 On Thu, Jan 12, 2012 at 12:51 PM, Friscia, Michael 
 michael.fris...@yale.edu
  wrote:

  Anyone still interested in the topic of remote authentication for ILLiad
  using CAS? (for sites that host their own ILLiad instance)
 
  I just completed integration this morning without using the various UofA
  or UC Davis ISAPI filters out there. If there's interest I'd be happy to
  share how it was done.
 
  ___
  Michael Friscia
  Manager, Digital Library  Programming Services
 
  Yale University Library
  (203) 432-1856

[CODE4LIB] Data Mining / Business Analytics in libraries

2011-12-15 Thread Cindy Harper

Are there any listservs, blogs, forums addressing data mining in
libraries?  I've taken some courses, and am now exploring software - I just
tried our RapidMiner, which integrates with R and Weka, and has facility
for data cleaning and storage. I'm interested to see if anyone is sharing
their experiences with Business Analytics type products in libraries.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] Examples of visual searching or browsing

2011-10-31 Thread Cindy Harper

A couple years ago, I used a crossmap of LC call numbers to subject
headings (admittedly out of date) to provide subject-labeled sort by call
number on an experimental catalog
search.http://lisv06.colgate.edu/profound/%20%20%20%20%20%20%3C/div%3E%09%09%09%09http://lisv06.colgate.edu/profound/%09%09%09%09%09%09%09%09%09%09%09%09
The mapping came from Mona Scott. Conversion
Tableshttp://encore.colgate.edu/iii/encore/search/C%7CSmona+subject+scott%7COrightresult%7CU1?lang=engsuite=def
.http://encore.colgate.edu/iii/encore/search/C%7CSmona+subject+scott%7COrightresult%7CU1?lang=engsuite=def
1999

I don't know how robust this is, but try searching a word that will appear
across subject areas, like brown, to see the classification/subject
labels.

I read the tables into a database, and in a batch process, coded each call
number division by how deep into the hierarchy it was linked - the number
of indents from 1 to 6. My ambition was to then try to find the most
frequently used subject headings in each step of the hierarchy (limited to
a workable range) to try to generate some semantic-net-like set of links
between subject headings and classification. But I never was able to
pursue that goal.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

On Sun, Oct 30, 2011 at 5:58 PM, David Friggens frigg...@waikato.ac.nzwrote:

Clicking on one of Ben Shneiderman's treemapping projects reminded me
that
I've always thought treemaps [1] would serve well as a browsing interface
for library and archive collections because they work well with
hierarchical
data.

I played around with this earlier in the year, wanting to provide a
drill-down into our collections by call number.

For our Education Library's Teaching Collection, I used a three-level
visualisation of items based on Dewey hierarchy, and coloured by the
proportion of new (post 2006) items. I never put it online anywhere,
so have attached it here.

Dewey was pretty easy to get labels for the first three levels, and
that seemed reasonable enough for most areas. But the majority of our
items are LCC, and that's where I ran aground. The labels for the
first two letters are readily available, but far too general to make
this interesting. I couldn't seem to find any useful data in machine
readable format. Sourcing another level down from LoC [1] or Wikipedia
[2] seems tantalisingly close, but there's a whole lot of manual
effort in turning these (incomplete) ranges into something usable.

Cheers
David

[1] http://www.loc.gov/catdir/cpso/lcco/
[2] http://en.wikipedia.org/wiki/Library_of_Congress_Classification

--
oʇɐʞıɐʍ ɟo ʎʇısɹǝʌıun
uɐıɹɐɹqıן sɯǝʇsʎs

Re: [CODE4LIB] Examples of visual searching or browsing

2011-10-31 Thread Cindy Harper

Oh - looks like the item display didn't survive the transition to IIS 7 -
I'll look into that.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

On Mon, Oct 31, 2011 at 11:49 AM, Cindy Harper char...@colgate.edu wrote:

A couple years ago, I used a crossmap of LC call numbers to subject
headings (admittedly out of date) to provide subject-labeled sort by call
number on an experimental catalog search.http://lisv06.colgate.edu/profound/
The mapping came from Mona Scott. Conversion
Tableshttp://encore.colgate.edu/iii/encore/search/C%7CSmona+subject+scott%7COrightresult%7CU1?lang=engsuite=def
.http://encore.colgate.edu/iii/encore/search/C%7CSmona+subject+scott%7COrightresult%7CU1?lang=engsuite=def
1999

I don't know how robust this is, but try searching a word that will appear
across subject areas, like brown, to see the classification/subject
labels.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

On Sun, Oct 30, 2011 at 5:58 PM, David Friggens frigg...@waikato.ac.nzwrote:

Clicking on one of Ben Shneiderman's treemapping projects reminded me
that
I've always thought treemaps [1] would serve well as a browsing
interface
for library and archive collections because they work well with
hierarchical
data.

I played around with this earlier in the year, wanting to provide a
drill-down into our collections by call number.

Cheers
David

[1] http://www.loc.gov/catdir/cpso/lcco/
[2] http://en.wikipedia.org/wiki/Library_of_Congress_Classification

--
oʇɐʞıɐʍ ɟo ʎʇısɹǝʌıun
uɐıɹɐɹqıן sɯǝʇsʎs

[CODE4LIB] Archivists' Toolkit, Timeouts and Hibernate

2011-10-06 Thread Cindy Harper

I'm asking you all because it's not clear to me how to interact with the AT
developers directly - the response back from the ATUG list is rather slow,
and I'm hoping you can give me a technical explanation a la no, because...
rather than just a no.

We're trying to adopt Archivists Toolkit at Colgate. We don't have a Java
developer in-house, but I'm exploring whether I can learn to address minor
issues myself.

We're a small liberal arts college, so library policy is to out-source as
much infrastructure as possible (meaning open source is generally avoided).
So the MySQL database is hosted on a Lunarpages server, and I can't adjust
the timeout at the server level. But I'm suspecting that the timeout we're
seeing is not a timeout of the given MySQL transaction, but instead a
problem with Hibernate persistence.  The symptom - we edit a record, proceed
to child records that require much editing - the chunk of data that my
people are trying to enter at one time takes over 10 minutes to edit.
During their editing the child records, an error occurs.  AT has added error
code to sense that when this is a JDBCConnectionError, then it forces you to
restart.
   if(errorText.contains(JDBCConnectionException)) {
String message = Database connection has been lost due to a
server timeout.\n\n +
Please RESTART the program to continue.  If the problem
persists, consult your System Administrator.;

So what I did was add a connectTimeout=3600 parameter to the
SessionFactory database URL.  But I still seem to have trouble with the
timeout.

Now, I acknowledge that understanding Hibernate and how it interacts with
JDBC and altering code in AT may be getting over my head, and that what I
probably should try next is either putting the database on my local MS SQL
Server instance, or my test-server instance of MySQL (I don't have a local
production instance of MySQL), and abandon the hosted server.

But can any of you add to my knowledge base here, and tell me:
 - is it possible to correct this problem easily in the AT code?
 - is the JDBCConnectionException due to the MySQL server timeout that is
set by connectTimeout?
- is simply adding a parameter to the database URL an effective way of
making sure that that parameter is used in each opensession instance?
- I know I have a lot to learn about hibernate - I've located a book to skim
in Books24x7 - I'll try wikipedia to get a briefer intial grounding. Any
other advice?


Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] Archivists' Toolkit, Timeouts and Hibernate

2011-10-06 Thread Cindy Harper

Hi - I tried socketTimeout (I don't believe it's set in the SessionFactory
code, but it may be in the hibernate config), and then got a recordlock
error after 5 minutes:
java.lang.NullPointerException
at
org.archiviststoolkit.mydomain.DomainAccessObjectImpl.update(DomainAccessObjectImpl.java:228)
at
org.archiviststoolkit.util.RecordLockUtils.updateRecordLocksTime(RecordLockUtils.java:170)
at org.archiviststoolkit.Main$1.run(Main.java:526)
at java.lang.Thread.run(Unknown Source)

I'll try Chris' solution next.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

On Thu, Oct 6, 2011 at 4:12 PM, Cowles, Esme escow...@ucsd.edu wrote:

Cindy-

I think connectTimeout is used for making the initial connection to the
database. But the error you describe sounds more like the initial
connection succeeds, but then there is a timeout afterwards. I think the
socketTimeout parameter is what would control the timeout during an editing
session. Though the docs say both connectTimeout and socketTimeout are 0
for no timeout by default:

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html

Is socketTimeout specified in the JDBC config, by any chance?

-Esme
--
Esme Cowles escow...@ucsd.edu

In the old days, an operating system was designed to optimize the
utilization of the computer's resources. In the future, its main goal
will be to optimize the user's time. -- Jakob Nielsen

On 10/6/2011, at 3:05 PM, Cindy Harper wrote:

I'm asking you all because it's not clear to me how to interact with the
AT
developers directly - the response back from the ATUG list is rather
slow,
and I'm hoping you can give me a technical explanation a la no,
because...
rather than just a no.

We're trying to adopt Archivists Toolkit at Colgate. We don't have a Java
developer in-house, but I'm exploring whether I can learn to address
minor
issues myself.

We're a small liberal arts college, so library policy is to out-source as
much infrastructure as possible (meaning open source is generally
avoided).
So the MySQL database is hosted on a Lunarpages server, and I can't
adjust
the timeout at the server level. But I'm suspecting that the timeout
we're
seeing is not a timeout of the given MySQL transaction, but instead a
problem with Hibernate persistence. The symptom - we edit a record,
proceed
to child records that require much editing - the chunk of data that my
people are trying to enter at one time takes over 10 minutes to edit.
During their editing the child records, an error occurs. AT has added
error
code to sense that when this is a JDBCConnectionError, then it forces you
to
restart.
if(errorText.contains(JDBCConnectionException)) {
String message = Database connection has been lost due to a
server timeout.\n\n +
Please RESTART the program to continue. If the problem
persists, consult your System Administrator.;

So what I did was add a connectTimeout=3600 parameter to the
SessionFactory database URL. But I still seem to have trouble with the
timeout.

Now, I acknowledge that understanding Hibernate and how it interacts with
JDBC and altering code in AT may be getting over my head, and that what I
probably should try next is either putting the database on my local MS
SQL
Server instance, or my test-server instance of MySQL (I don't have a
local
production instance of MySQL), and abandon the hosted server.

But can any of you add to my knowledge base here, and tell me:
- is it possible to correct this problem easily in the AT code?
- is the JDBCConnectionException due to the MySQL server timeout that is
set by connectTimeout?
- is simply adding a parameter to the database URL an effective way of
making sure that that parameter is used in each opensession instance?
- I know I have a lot to learn about hibernate - I've located a book to
skim
in Books24x7 - I'll try wikipedia to get a briefer intial grounding. Any
other advice?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] Usage and financial data aggregation

2011-09-14 Thread Cindy Harper

We're a III library (yes I know), and looking into the new Sierra product's
API that promises to release some of this data to us for use in a 3rd-party
product such as you describe. III does have its proprietary Encore Reporter
product, but I'm predicting some Sierra sites will look for an open-source
product. I'd be very interested in working with others on such an effort.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Tue, Sep 13, 2011 at 5:08 PM, Jason Stirnaman jstirna...@kumc.eduwrote:

 Does anyone have suggestions or recommendations for platforms that can
 aggregate usage data from multiple sources, combine it with financial data,
 and then provide some analysis, graphing, data views, etc?
 From what I can tell, something like Ex Libris' Alma would require all
 fulfillment transactions to occur within the system.
 I'm looking instead for something like Splunk that would accept log data,
 circulation data, usage reports, costs, and Sherpa/Romeo authority data but
 then schematize it for data analysis and maybe push out reporting dashboards
 nods to Brown Library http://library.brown.edu/dashboard/widgets/all/ 
 I'd also want to automate the data retrieval, so that might consist of
 scraping, web services, and FTP, but that could easily be handled
 separately.
 I'm aware there are many challenges, such as comparing usage stats, shifts
 in journal aggregators, etc.
 Does anyone have any cool homegrown examples or ideas they've cooked up for
 this? Pie in the sky?


 Jason
 Jason Stirnaman
 Biomedical Librarian, Digital Projects
 A.R. Dykes Library, University of Kansas Medical Center
 jstirna...@kumc.edu
 913-588-7319

Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Cindy Harper

So I take it, this would need a fast connection between Google and your
server, but would tolerate a slow connection between the user and Google?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Thu, Aug 4, 2011 at 6:03 AM, Richard Wallis richard.wal...@talis.comwrote:

 Why not let someone else, such as the Google, do the heavy lifting for you:
 https://docs.google.com/viewer

 ~Richard.

 On 4 August 2011 07:39, Dave Caroline dave.thearchiv...@gmail.com wrote:

  One method is to dispense with PDF and just view the scanned pages online
  as
  images or OCR'd text or point the user to a directory with the scans
  for the document.
  He then only needs an image viewer using a lot less of his machines
 memory.
 
  Large PDF's also cause problems in the viewing computer. I was
  reviewing someones
  25mb PDF the other day and it peaked at 3.3 gig memory use, which on a
  2.5gig
  memory box meant it went into swap and slowed to a crawl.
  The viewer used there was evince.
 
  I scan to jpg and only produce a PDF if nagged
 
 
 http://www.collection.archivist.info/archive/manuals/IS44_Tektronix_602_display_unit/
 
  As I serve from home and the upload is on the slow side individual
  pages helps there too.
  And when in a good mood I finish off a document thus
  http://www.collection.archivist.info/searchv13.php?searchstr=lucas+tp1
  where all pages are web viewable. Been too lazy to write a page to
  page link on the page
  view so far (need a round tuit).
 
  Dave Caroline
 



 --
 Richard Wallis
 Technology Evangelist, Talis
 Tel: +44 (0)7767 886 005

 Linkedin: http://www.linkedin.com/in/richardwallis
 Skype: richard.wallis1
 Twitter: @rjw
 IM: rjw3...@hotmail.com

Re: [CODE4LIB] Access 2011 Conference - Early Bird Reminder

2011-06-29 Thread Cindy Harper

For those of us unable to attend, will handouts/video be posted?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Wed, Jun 29, 2011 at 12:48 AM, Mark Jordan mjor...@sfu.ca wrote:

 It's only 112 days until the Access 2011 conference this fall. A friendly
 reminder to look at the schedule and started making plans to come to
 Vancouver! We're also still accepting Hackfest project suggestions:
 http://access2011.library.ubc.ca/hackfest/

 Early Bird registration rate is only available until August 1st
 http://access2011.library.ubc.ca/registration/. At only $169/night the
 conference hotel is quickly booking up. The rate is available for several
 days before and after the conference, but you must book through the link on
 the conference website  http://access2011.library.ubc.ca/hotel/.

 Open data, open source development and community building, digital
 preservation, and artful data visualization! We hope to see you here for all
 this and more October 19-22, 2011 in Vancouver.

 Mark Jordan
 Access 2011 Conference Planning Committee
 Follow us on Twitter - @access_2011

Re: [CODE4LIB] ajaxy CRUD / weeding helper

2011-05-13 Thread Cindy Harper

The weeding project that we've started this year involves identifying
unneeded added copies and outdated editions only. Rather than have the
professional librarians examine every book on every shelf, I've suggested we
prepare some lists that students can pull - where there has been less than x
uses in the past y years and there are more than one copies. We haven't
historically provided copy numbers on our records, so we can't tell if two
item records are for the same entity, but I think student workers could
probably handle checking for that.  The next category of thing - superceded
editions - is more difficult to check for - they may have the same call
number with a different date on the call number, etc.  Has anyone done any
work to match author/title and identify the series of editions based on
that?  Or any other automation that would help with this weeding project.
Our collection managers are skeptical that it can be automated in any way.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu

Re: [CODE4LIB] Group-sourced Google custom search site?

2011-05-11 Thread Cindy Harper

That's right.  I see that Google didn't provide a -1 button on their +1
button experiment.


Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Wed, May 11, 2011 at 2:32 PM, Peter Noerr pno...@museglobal.com wrote:

 Just curious: - what do you mean by  Some way to avoid the site-scrapers
 who populate the troubleshooting
  pages. (last sentence below)?

 I presume you are wishing to avoid the trouble shooting sites which
 consist of nothing more than pages copied from other sites, and look only at
 the prime source pages for information?

 Peter

  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Cindy Harper
  Sent: Monday, May 02, 2011 2:15 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] Group-sourced Google custom search site?
 
  That reminds me - I was looking last week into the possibility of making
 a
  Google custom search site with either a whitelist of trusted technology
  sites, or a blacklist of sites to exclude.  I haven't looked into whether
  the management of that could be group-sourced, but maybe someone else
 here
  has thought about this.  I haven't looked into the terms of service of
  custom search sites, either.  But of course slashdot was high on the
  whitelist.  I was thinking about sites for several purposes - general
  technology news and opinion, or specific troubleshooting / programming
  sites.  Some way to avoid the site-scrapers who populate the
 troubleshooting
  pages.
 
 
  Cindy Harper, Colgate U.

[CODE4LIB] Group-sourced Google custom search site?

2011-05-02 Thread Cindy Harper

That reminds me - I was looking last week into the possibility of making a
Google custom search site with either a whitelist of trusted technology
sites, or a blacklist of sites to exclude.  I haven't looked into whether
the management of that could be group-sourced, but maybe someone else here
has thought about this.  I haven't looked into the terms of service of
custom search sites, either.  But of course slashdot was high on the
whitelist.  I was thinking about sites for several purposes - general
technology news and opinion, or specific troubleshooting / programming
sites.  Some way to avoid the site-scrapers who populate the troubleshooting
pages.


Cindy Harper, Colgate U.

[CODE4LIB] Fwd: online courses- Sentiment Analysis, Text Mining

2011-04-26 Thread Cindy Harper

I recommend the courses from statistics.com - price reductions for
educators...  Do you see any possible applications for libraries?

-- Forwarded message --
From: Peter Bruce ourcour...@statistics.com
Date: Tue, Apr 26, 2011 at 4:09 PM
Subject: online courses- Sentiment Analysis, Text Mining
To: char...@mail.colgate.edu

Dear ... :

How are you (your organization/product/service) regarded in
cyberspace?  Thrifty with money and time, people are unsparing
with their opinions using Twitter, Facebook, Yelp, Flixster,
blogs, web forums, product reviews...

Sentiment analysis is the relatively new art and science of
distilling useful data from this mass of unstructured text.
The first annual conference on this subject was just held in
NYC (google Sentiment Analysis Symposium); one of the main
presenters was Nitin Indurkhya and his staff from eBay.  He will
present two online courses at statistics.com in June and July:

Jun 3 - Jul 1:  Text Mining (4 weeks)
Jul 8 - Jul 29:  Sentiment Analysis (3 weeks)

Text Mining will introduce the essential techniques of text
mining -the extension of data mining's standard predictive
methods to unstructured text. This course will discuss these
standard predictive modeling techniques (some familiarity with
these methods will help), and will devote considerable attention
to the data preparation and handling methods that are required
to transform unstructured text into a form in which it can be
mined.  Access to software is provided with the course text.

Sentiment Analysis introduces you to the algorithms,
techniques and software used in sentiment analysis. Their
use will be illustrated by reference to existing applications,
particularly product reviews and opinion mining. The course
will try to make clear both the capabilities and the
limitations of these applications. For real-world
applications, sentiment analysis draws heavily on work
in computational linguistics and text-mining. At the
completion of the course, a student will have a good
idea of the field of sentiment analysis, the current
state-of-the-art and the issues and problems that are
likely to be the focus of future systems.

Nitin Indurkhya is co-author of Text Mining (Springer), and
co-editor of the Handbook of Natural Language Processing (CRC).
Dr. Indurkhya is Principal Research Scientist at eBay.
Previously, he was a Professor at the School of Computer
Science and Engineering, University of New South Wales
(Australia), as well as the founder and president of
Data-Miner Pty Ltd, an Australian company engaged in
data-mining consulting and education.  Participants can
ask questions and exchange comments directly with Dr. Indurkhya
via a private discussion forum throughout each course.

For details and to register:
http://www.statistics.com/courses/data-mining-2/textmining/
http://www.statistics.com/courses/data-mining-2/sentiment-analysis/

The courses take place online at statistics.com in a series of
weekly lessons and assignments, and require about 15 hours per
week.  Participate at your own convenience; there are no set
hours when you must be online.

Peter Bruce
ourcour...@statistics.com

P.S. Just let me know if you no longer wish to receive our
course announcements.

statistics.com 612 N. Jackson St. Arlington VA 22201 USA

[CODE4LIB] Semantic web introduction to tools

2011-03-23 Thread Cindy Harper

This article came in via email this morning - it may be the kind of pointers
I needed to read about open-source tools to get started using the SW.


 *Computerworld First Look*


http://cwonline.computerworld.com/t/7258117/240182/237524/0/?0fc84754=Y2hhcnBlckBtYWlsLmNvbGdhdGUuZWR1x=9633e82f
  --

 *Semantic Web: Tools you can
use*http://cwonline.computerworld.com/t/7258117/240182/376767/0/
 Standards, tools, platforms, prewritten components and services are
available to help make semantic deployments less time-consuming, less
technically complex and (somewhat) less costly.  *Read
More*http://cwonline.computerworld.com/t/7258117/240182/376767/0/

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?

2011-03-23 Thread Cindy Harper

Maybe I shouldn't be trolling code4lib for my personal interests, but I'm
asking not about a mission-critical application, but a platform for keeping
my personal skills up, and that would be accorded the proportionate amount
of time. So I'd rather that not be a time sink for management, and I don't
want to create a hacker-cracker's delight.  My college is not enthused about
librarians creating code or platforms that the college becomes responsible
for maintaining - we're very abstemious in that regard.  So I'm seeing how I
can do this personally spending my personal cash without burdening my
college.  Sorry to bother you all with it.  Everyone's happy family is
different, to hash a quote, but I hope I'm still welcome in Code4Lib, even
if I'm not hired to be a library coder. Just a library (Windows) sys admin.
Or maybe we need a spin-off code4lib for the amateurs among us.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Wed, Mar 23, 2011 at 10:55 AM, Bill Dueber b...@dueber.com wrote:

 On Wed, Mar 23, 2011 at 10:44 AM, Cary Gordon listu...@chillco.com
 wrote:

   You can probably find an curious intern to do it.
 

 Oh, for the love of god, please don't go this route. This is why libraries
 tend to be a huge mishmash of unsupported, one-off crap that some outgoing
 student did for extra credit six years ago.

 To ask the obvious question: You're at a real,
 honest-to-god prestigious college. Why are you trolling code4lib for cheap
 hosting environments? If IT won't give you a piece of a machine somewhere,
 or at least set up a Mac running OSX, they're failing to support a critical
 mission of the college and someone needs to be up in arms about it. If you
 haven't even asked them, well, maybe you should.

  -Bill, who spent his first two years in a library dealing with crappy old
 PHP code from long-gone students

 --
 Bill Dueber
 Library Systems Programmer
 University of Michigan Library

Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?

2011-03-23 Thread Cindy Harper

Sorry not to giv e more info on my first request.  But I'm a little shy
about owning up to my untried idea since there are a lot of IFs and unknowns
about the whole project.  By the time I get anywhere on this project, III
will have made it possible to link Subject Guides from Encore, and my idea
will have no chance of adoption  at my own school.  And I must admit, my
goals are equal parts wanting to be able to bring my idea to reality with my
own hands, as well as providing a possible aid toward meeting the problem of
students don't always make the transition from Google to the library
resources.

My idea is/was this:  I created a Firefox extension that fires upon a Google
search. The idea is to identify the general subject area of the search, and
pop-up a notice about the pertinent library subject guides, reminding the
user that these resources are paid for and selected by the libraries.  My
idea for identifying the general subject area was to use a catalog search,
mapping the call numbers of the resulting hits to the given subject guides.

So I prototyped this in ASP .NET, which is the platform I have most
experience with, using YAZ/VB-Zoom to perform the Z39.50 search.  This is an
example of my prototype pop-up. A warning, I assumed it was a pop-up, so if
you're seeing it in tabs, it's going to resize your browser.  That needs
work.

  http://lisv06.colgate.edu/aftergoogle/default.aspx?searchargs=iran+nuclear

There are a lot of questions to be answered, though.  What proportion of
Google searches consist of phrases that could be found in the catalog?  What
proportion of Google searches (in our computer labs, for instance) need
scholarly information?  To answer those questions, I intended to log the
searches and the success of the mapping.

Well, my adminstration rejected my proposal to test the app in their
reference area.  So I showed my app to Andrew Darby, author of Subjects
Plus, the app we use for our subject guides, and he was interested - if I
could port it over to PHP.  And since it would need to support a variety of
ILS's, Z39.50 seems still to be the most likely technology.

 That was last October, and I have yet to get PHP_YAZ working.  Of course,
this is a squeeze-in in my own spare time, so the time devoted to it is
sporadic.

And the idea of offering it to other small libs is also why I would want to
have the app hosted. If Andrew were to offer it as a part of Subjects Plus,
it would have to be something that a library like mine could support without
a lot of in-house support. So I need to know what hosting service could make
it easiest for a small library with a small staff.

I know there are other questions - what kind of burden on the ILS would this
be?  I know Z39.50 is old technology, and there are probably other problems
you all can predict.

So that's what I'm up to.




Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Wed, Mar 23, 2011 at 11:28 AM, Jon Gorman jonathan.gor...@gmail.comwrote:

 On Wed, Mar 23, 2011 at 10:13 AM, Cindy Harper char...@colgate.edu
 wrote:
   Sorry to bother you all with it.  Everyone's happy family is
  different, to hash a quote, but I hope I'm still welcome in Code4Lib,
 even
  if I'm not hired to be a library coder. Just a library (Windows) sys
 admin.
  Or maybe we need a spin-off code4lib for the amateurs among us.

 I think Bill meant why are you coming down here with us trolls when
 you're at such a nice place?  You're quite welcome, although you've
 certainly have my curiosity up about why you want to run php_yaz in
 the first place.  You didn't have much in the way of details in your
 initial email.  It might change some people's advice if you're not
 intending the system to a long-term production system.  (And I'm still
 curious what systems are even using php_yaz)

 Jon Gorman

Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?

2011-03-08 Thread Cindy Harper

Sorry - what do you mean by triggers their usage monitor - CPU usage above
a certain threshold? Or they don't allow compiles?  I spoke with Bluehost,
and they indicated that if I got SSH access, I could try to compile it
myself. I'll check to see if this is possible with Lunarpages, which we now
have accounts with.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Mon, Mar 7, 2011 at 1:58 PM, Ross Singer rossfsin...@gmail.com wrote:

 Cindy, I think this might be possible, depending on the provider.  I
 have a site on Site5 and this seems pretty doable (it looks like I
 might have even tried this at some point, since I seem to have a
 compiled version of yaz in my home directory).  It would probably take
 some rooting around in the forums to see how people successfully are
 installing PECL extensions and it might take a few tries to compile
 yaz successfully (since if it triggers their usage monitor, they'll
 kill the process), but I think it would be worth a shot.  I would
 definitely recommend this before jumping to a VPS (and let's be
 realistic, everybody, if you're being this blasee about running a VPS,
 you are either investing some time/expertise sys admining it or you
 have an insecure server waiting to be exploited).

 Good luck!
 -Ross.



 On Mon, Mar 7, 2011 at 1:17 PM, Cindy Harper char...@colgate.edu wrote:
  I guess I was hoping to have service such as that provided by my current
  hosting service, where security,etc., updates for L A M  P are all taken
  care of by the host. Any recommendations along those lines?  One that
  provides that and still lets me install what I want? My service suggested
  that I go to a VPS account,where I'd have to do my own updates.
 
  Cindy Harper, Systems Librarian
  Colgate University Libraries
  char...@colgate.edu
  315-228-7363
 
 
 
  On Mon, Mar 7, 2011 at 11:00 AM, Han, Yan h...@u.library.arizona.edu
 wrote:
 
  You can just buy a node from a variety of cloud providers such as Amazon
  EC2, Linode etc. (It is very easy to build anything you want).
 
 
  Yan
 
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Cindy Harper
  Sent: Sunday, March 06, 2011 10:54 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] LAMP Hosting service that supports php_yaz?
 
  At the risk of exhausting my quota of messages for the month - Our LAMP
  hosting service does not support PECL extension php_yaz. Does anyone
 know of
  a service that does?
 
  Cindy Harper, Systems Librarian
  Colgate University Libraries
  char...@colgate.edu
  315-228-7363

Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?

2011-03-08 Thread Cindy Harper

Thanks, Ross.  So that's why they call it nice? As usual I have much to
learn.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Tue, Mar 8, 2011 at 10:50 AM, Ross Singer rossfsin...@gmail.com wrote:

 Cindy, sorry, I realize that was vague.  I have shell access on Site5,
 but since you're using shared resources, they monitor your CPU/memory
 usage.  During high volume on a particular server, they'll kill
 processes that are running to make sure they can meet demands.  This
 *could* happen when you're trying to compile something, which tends to
 be CPU-intensive, although it just depends.

 I've had their trigger kick in while trying to install ruby gems,
 although it's completely unpredictable (that is, based on all sorts of
 variables) - sometimes the gems install with no problem, other times
 they're killed.  Compiling yaz is probably less of an issue (the
 makefile calls lots of things that run intensely, but quickly) than
 the pecl install of php/yaz.

 Running things in nice (http://linux.die.net/man/2/nice) probably
 helps your chances, but YMMV.  I don't think this policy is exclusive
 to Site5, pretty much all of the major shared web hosting providers
 will have something similar in place, otherwise users could constantly
 have processes running in shells.

 Like I said, though, it shouldn't be a problem, it just might take a
 few tries (which will be less work, in the long run, then running your
 own VPS).

 -Ross.

 On Tue, Mar 8, 2011 at 10:05 AM, Cindy Harper char...@colgate.edu wrote:
  Sorry - what do you mean by triggers their usage monitor - CPU usage
 above
  a certain threshold? Or they don't allow compiles?  I spoke with
 Bluehost,
  and they indicated that if I got SSH access, I could try to compile it
  myself. I'll check to see if this is possible with Lunarpages, which we
 now
  have accounts with.
 
  Cindy Harper, Systems Librarian
  Colgate University Libraries
  char...@colgate.edu
  315-228-7363
 
 
 
  On Mon, Mar 7, 2011 at 1:58 PM, Ross Singer rossfsin...@gmail.com
 wrote:
 
  Cindy, I think this might be possible, depending on the provider.  I
  have a site on Site5 and this seems pretty doable (it looks like I
  might have even tried this at some point, since I seem to have a
  compiled version of yaz in my home directory).  It would probably take
  some rooting around in the forums to see how people successfully are
  installing PECL extensions and it might take a few tries to compile
  yaz successfully (since if it triggers their usage monitor, they'll
  kill the process), but I think it would be worth a shot.  I would
  definitely recommend this before jumping to a VPS (and let's be
  realistic, everybody, if you're being this blasee about running a VPS,
  you are either investing some time/expertise sys admining it or you
  have an insecure server waiting to be exploited).
 
  Good luck!
  -Ross.
 
 
 
  On Mon, Mar 7, 2011 at 1:17 PM, Cindy Harper char...@colgate.edu
 wrote:
   I guess I was hoping to have service such as that provided by my
 current
   hosting service, where security,etc., updates for L A M  P are all
   taken
   care of by the host. Any recommendations along those lines?  One that
   provides that and still lets me install what I want? My service
   suggested
   that I go to a VPS account,where I'd have to do my own updates.
  
   Cindy Harper, Systems Librarian
   Colgate University Libraries
   char...@colgate.edu
   315-228-7363
  
  
  
   On Mon, Mar 7, 2011 at 11:00 AM, Han, Yan
   h...@u.library.arizona.eduwrote:
  
   You can just buy a node from a variety of cloud providers such as
   Amazon
   EC2, Linode etc. (It is very easy to build anything you want).
  
  
   Yan
  
  
   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of
   Cindy Harper
   Sent: Sunday, March 06, 2011 10:54 AM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: [CODE4LIB] LAMP Hosting service that supports php_yaz?
  
   At the risk of exhausting my quota of messages for the month - Our
 LAMP
   hosting service does not support PECL extension php_yaz. Does anyone
   know of
   a service that does?
  
   Cindy Harper, Systems Librarian
   Colgate University Libraries
   char...@colgate.edu
   315-228-7363

Re: [CODE4LIB] LAMP Hosting service that supports php_yaz?

2011-03-07 Thread Cindy Harper

I guess I was hoping to have service such as that provided by my current
hosting service, where security,etc., updates for L A M  P are all taken
care of by the host. Any recommendations along those lines?  One that
provides that and still lets me install what I want? My service suggested
that I go to a VPS account,where I'd have to do my own updates.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Mon, Mar 7, 2011 at 11:00 AM, Han, Yan h...@u.library.arizona.eduwrote:

 You can just buy a node from a variety of cloud providers such as Amazon
 EC2, Linode etc. (It is very easy to build anything you want).


 Yan


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Cindy Harper
 Sent: Sunday, March 06, 2011 10:54 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] LAMP Hosting service that supports php_yaz?

 At the risk of exhausting my quota of messages for the month - Our LAMP
 hosting service does not support PECL extension php_yaz. Does anyone know of
 a service that does?

 Cindy Harper, Systems Librarian
 Colgate University Libraries
 char...@colgate.edu
 315-228-7363

Re: [CODE4LIB] online course on the semantic web?

2011-03-06 Thread Cindy Harper

The JHU course is a semester-long equivalent, and is in the $3000 range.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Sat, Mar 5, 2011 at 7:51 PM, Joe Hourcle
onei...@grace.nascom.nasa.govwrote:

 On Mar 5, 2011, at 3:01 PM, Cindy Harper wrote:

  Well, I just walked my 80-year-old mother through setting up her wireless
  router and wireless on her desktop and laptop via telephone NY-to-VA, and
  now I feel like I can think about another challenge for the coming
  season(s). Does anyone know of a good online course that's an
 introduction
  to semantic web technology that they could recommend? My goals are simply
 to
  understand more and be able to code a little, and afterward applying it
 to
  linked data?  I know of one course this summer at Johns Hopkins
 Engineering
  for Professionals program
  http://ep.jhu.edu/course-homepages/viewpage.php?homepage_id=2993, but
 it's
  rather pricey. Anyone know of cheaper options or creative ideas for
 funding?


 I don't know how introductory it'd be, but ASIST has been doing a lot of
 'webinars' this year, and there are ones coming up on the 9th and 13th on
 linked data, and the first one sounds like it'll cover some semantic web
 issues::

http://asis.org/Conferences/webinars/2011/linked-data.html

 (I can't compare prices to the JHU one, as I didn't see any pricing on the
 JHU site; this round of ASIST webinars are $25 for members, $59 for
 non-members; some in the past have been free for ASIST members)

 Also, looking at MIT's Open Courseware catalog, I see a few individual
 lessons that might be applicable:

http://ocw.mit.edu/index.htm

 In the past, I've looked at some of the courses from W3schools (not
 affiliated
 with W3C, but has some tutorials on various things related to the web).
  They
 tend to be fairly introductory, but they have two that might be of
 interest:

http://www.w3schools.com/rdf/default.asp
http://www.w3schools.com/semweb/default.asp

 -Joe

 -
 Joe Hourcle
 Programmer/Analyst
 Solar Data Analysis Center
 Goddard Space Flight Center

[CODE4LIB] LAMP Hosting service that supports php_yaz?

2011-03-06 Thread Cindy Harper

At the risk of exhausting my quota of messages for the month - Our LAMP
hosting service does not support PECL extension php_yaz. Does anyone know of
a service that does?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] online course on the semantic web?

2011-03-06 Thread Cindy Harper

Thanks to Jerry and to Joe and Karen - these links look good!

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Sun, Mar 6, 2011 at 12:59 PM, Jerry Persons jpers...@stanford.eduwrote:

 A couple of things you might look at:

 [1] just published, free HTML version ... Heath and Bizer
 http://linkeddatabook.com/editions/1.0/

 Tom Heath and Christian Bizer (2011) Linked Data: Evolving the
 Web into a Global Data Space (1st edition). Synthesis Lectures on
 the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan 
 Claypool.

 [2] the materials for the 2-day intro to Web of Data from Talis
 http://dynamicorange.com/2010/11/03/web-of-data/
 leading to:
 http://api.talis.com/stores/training/items/training.html
 http://api.talis.com/stores/training/items/exercises.html


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf Of Cindy Harper
 Sent: Sunday, March 06, 2011 9:32 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] online course on the semantic web?

 The JHU course is a semester-long equivalent, and is in the $3000
 range.

 Cindy Harper, Systems Librarian
 Colgate University Libraries
 char...@colgate.edu
 315-228-7363



 On Sat, Mar 5, 2011 at 7:51 PM, Joe Hourcle
 onei...@grace.nascom.nasa.govwrote:

  On Mar 5, 2011, at 3:01 PM, Cindy Harper wrote:
 
   Well, I just walked my 80-year-old mother through setting up
 her
   wireless router and wireless on her desktop and laptop via
 telephone
   NY-to-VA, and now I feel like I can think about another
 challenge
   for the coming season(s). Does anyone know of a good online
 course
   that's an
  introduction
   to semantic web technology that they could recommend? My
 goals are
   simply
  to
   understand more and be able to code a little, and afterward
 applying
   it
  to
   linked data?  I know of one course this summer at Johns
 Hopkins
  Engineering
   for Professionals program
  
 http://ep.jhu.edu/course-homepages/viewpage.php?homepage_id=2993,
   but
  it's
   rather pricey. Anyone know of cheaper options or creative
 ideas for
  funding?
 
 
  I don't know how introductory it'd be, but ASIST has been
 doing a lot
  of 'webinars' this year, and there are ones coming up on the
 9th and
  13th on linked data, and the first one sounds like it'll cover
 some
  semantic web
  issues::
 
 
 http://asis.org/Conferences/webinars/2011/linked-data.html
 
  (I can't compare prices to the JHU one, as I didn't see any
 pricing on
  the JHU site; this round of ASIST webinars are $25 for
 members, $59
  for non-members; some in the past have been free for ASIST
 members)
 
  Also, looking at MIT's Open Courseware catalog, I see a few
 individual
  lessons that might be applicable:
 
 http://ocw.mit.edu/index.htm
 
  In the past, I've looked at some of the courses from W3schools
 (not
  affiliated with W3C, but has some tutorials on various things
 related
  to the web).
   They
  tend to be fairly introductory, but they have two that might be
 of
  interest:
 
 http://www.w3schools.com/rdf/default.asp
 http://www.w3schools.com/semweb/default.asp
 
  -Joe
 
  -
  Joe Hourcle
  Programmer/Analyst
  Solar Data Analysis Center
  Goddard Space Flight Center

[CODE4LIB] online course on the semantic web?

2011-03-05 Thread Cindy Harper

Well, I just walked my 80-year-old mother through setting up her wireless
router and wireless on her desktop and laptop via telephone NY-to-VA, and
now I feel like I can think about another challenge for the coming
season(s). Does anyone know of a good online course that's an introduction
to semantic web technology that they could recommend? My goals are simply to
understand more and be able to code a little, and afterward applying it to
linked data?  I know of one course this summer at Johns Hopkins Engineering
for Professionals program
http://ep.jhu.edu/course-homepages/viewpage.php?homepage_id=2993, but it's
rather pricey. Anyone know of cheaper options or creative ideas for funding?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] online course on the semantic web?

2011-03-05 Thread Cindy Harper

Now that I think about it, this may be an opportunity to apply another idea
that I was exploring in another context:  I had written to syslib-l looking
for anyone interested in collaborating on a staff technology training wiki
that would link staff to free and authoritative web-based resources on a
range of technology training subjects.  Would anyone be interested in
applying that idea to code4lib technology learning?  How much effort would
be required for someone who's well acquainted with the Semantic Web to
contribute to a site that lists texts or curriculum for those who are
interested in learning? I don't know if this is doable. Anyone interested?
Or should I just find myself a text and wade through it?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Sat, Mar 5, 2011 at 3:01 PM, Cindy Harper char...@colgate.edu wrote:

 Well, I just walked my 80-year-old mother through setting up her wireless
 router and wireless on her desktop and laptop via telephone NY-to-VA, and
 now I feel like I can think about another challenge for the coming
 season(s). Does anyone know of a good online course that's an introduction
 to semantic web technology that they could recommend? My goals are simply to
 understand more and be able to code a little, and afterward applying it to
 linked data?  I know of one course this summer at Johns Hopkins Engineering
 for Professionals program
 http://ep.jhu.edu/course-homepages/viewpage.php?homepage_id=2993, but it's
 rather pricey. Anyone know of cheaper options or creative ideas for funding?

 Cindy Harper, Systems Librarian
 Colgate University Libraries
 char...@colgate.edu
 315-228-7363

Re: [CODE4LIB] A suggested role for text mining in library catalogs?

2011-02-23 Thread Cindy Harper

Sorry - it's more reflective of me and my amateur status

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Tue, Feb 22, 2011 at 9:46 AM, Rob Casson rob.cas...@gmail.com wrote:

 And I probably should have added to your thread on NGC4LIB, rather than
 Code4lib - I tend to conflate them.

 i'm offended ;)

Re: [CODE4LIB] A suggested role for text mining in library catalogs?

2011-02-22 Thread Cindy Harper

It's not ironic - my post was musing inspired by your work.  I guess I
wasn't sure if I understood your results. You were looking at the overall
POS usage in the entire texts as a possible way of ranking the texts. I was
wondering about POS of particular search terms - those that could take on
several POS. A related question - does SOLR use stemming to widen the search
to various POS?  Then would it be meaningful to rank the given texts by the
POS of the actual search terms?  And has anyone looked at samples of user
search terms - are they almost always noun phrases?  Just wanting to
understand what you have explored.  And I probably should have added to your
thread on NGC4LIB, rather than Code4lib - I tend to conflate them.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Sat, Feb 19, 2011 at 5:42 PM, Eric Lease Morgan emor...@nd.edu wrote:

 On Feb 19, 2011, at 11:26 AM, Cindy Harper wrote:

  I just was testing our discovery engine for any technical issues after a
  reboot. I was just using random single words, and one word I used was
  correct.  Looking at the first ranked items, I wondered if there's some
  role for parts-of-speech in ranking hits - are nouns and , in this case,
  adjectives more indicative of aboutness than verbs?  The first items were
  Miss Manners ...  excruciating correctly behavior, then a bunch of
 govdocs
  on an act to correct.  I don't think there's any reason to prefer
  nouns over verbs, but I thought I'd throw the thought at you anyway.



 Ironically, I was playing with parts-of-speech (POS) analysis the other
 day. [1]

 Using a pseudo-random sample of texts, I found there to be surprisingly
 similar POS usage between texts. With such similarity, I thought it would be
 difficult to use general POS as a means for ranking or sorting. On the other
 hand, specific POS may be useful. For example, Thoreau was dominated by
 first-person male pronouns but Austen was dominated by second person female
 pronouns.

 I think there is something to be explored here.

 [1] POS - http://bit.ly/hsxD2i

 --
 Eric Still Counting Tweets and Chats Morgan

[CODE4LIB] A suggested role for text mining in library catalogs?

2011-02-19 Thread Cindy Harper

I just was testing our discovery engine for any technical issues after a
reboot. I was just using random single words, and one word I used was
correct.  Looking at the first ranked items, I wondered if there's some
role for parts-of-speech in ranking hits - are nouns and , in this case,
adjectives more indicative of aboutness than verbs?  The first items were
Miss Manners ...  excruciating correctly behavior, then a bunch of govdocs
on an act to correct.  I don't think there's any reason to prefer
nouns over verbs, but I thought I'd throw the thought at you anyway.


Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] simple,flexible ILS for a small librar

2010-10-06 Thread Cindy Harper

Joanne indicated there's a negative scanner on Leve5 - is thattrue?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Wed, Oct 6, 2010 at 8:42 AM, Jon Gorman jonathan.gor...@gmail.comwrote:

 On Wed, Oct 6, 2010 at 12:05 AM, Susan Kane adarconsult...@gmail.com
 wrote:
  I wonder if this person might be better served by some kind of bartering
  software.
 
  I wasn't sure there was such a thing -- but of course, there is.
 
  http://www.curomuto.com/
  http://www.barter-blog.com/?p=51

 There's also some book-swap and book exchange sites and projects out
 there.  It would be interesting to try to merge the three.  The OP
 seems to really want to use VuFind, so if you could use a book swap
 software on the administrative side and have that update the catalog
 that would be useful.  Or maybe even see if it's possible to create
 drivers for some book swap.  The problem with the book-swap is it is
 usually one person to one person, end to end.  There's no in-between
 library.

 Book Mooch is the one I've heard a lot about, although I haven't ever used
 it.

 Jon Gorman

Re: [CODE4LIB] [NGC4LIB] CSU library finds 40% of collection hasn't circulated

2010-10-05 Thread Cindy Harper

My mistake - wrong list!


On Tue, Oct 5, 2010 at 10:59 AM, Cindy Harper char...@colgate.edu wrote:


 Colgate University built an on-site ASRS in 2005 as part of renovating our
 entire main library.  During the 2 years of construction on

[CODE4LIB] Fwd: [NGC4LIB] CSU library finds 40% of collection hasn't circulated

2010-10-05 Thread Cindy Harper

Colgate University built an on-site ASRS in 2005 as part of renovating our
entire main library.  During the 2 years of construction on the building,
our services were dispersed among several buildings on campus and the
high-use portion of the collection that remained available to our students
during that  time was entirely housed in the ASRS, requested through our
online catalog, and delivered to our circulation point in utility vehicle
loads. Of course, we also made major use of the ConnectNY user-initiated
resource sharing and traditional ILL.  There was user dissatisfaction at
first, but one thing we learned is that patrons were greatly pleased when we
made a public awareness campaign to show them how to virtually browse the
stacks in call-number order using the OPAC.  The other thing we heard when
we moved back into our renovated building was that students were
disappointed that they had to go to the stacks and find the books
themselves! And faculty were disappointed when we stopped delivering
directly to their offices, of course - but we want them to come to the
library :) .  When we opened the new building, we brought up the Encore
discovery system, and blended it into the classic OPAC site as our keyword
search (classic indexes are still available in other tabs). Encore doesn't
have a virtual call number browse feature, but we have asked for this as an
enhancement - either a linear browse of the shelves, or a hierarchical call
number facet drill-down.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363




On Tue, Oct 5, 2010 at 10:01 AM, Emily Lynema emily_lyn...@ncsu.edu wrote:

 I agree with Dan that it is a bit of a moot point to argue about the
 benefits of moving materials to off-site storage. It is absolutely going to
 happen. But here's the thingit's been happening for years as we buy more
 and more e-books and digital collections. If the argument is that users need
 to be able to 'browse' the physical stacks, they've already been unable to
 discover digital materials in this way for some time now.

 But here's where I think this topic does tie in with NGC4LIB. The question
 we should be asking ourselves is What are our patrons losing when we move
 our physical print materials off-site? Are there tools we can build to help
 them recover that usefulness in new ways?

 It's for that exact reason that we are continuing to explore different,
 enriched ways to browse the collection virtually at NCSU, in addition to
 thinking about what enhanced delivery services we can offer to our patrons
 to make it easier and more reliable to get a book out of an automated
 retrieval system than it was to go find it in the stacks.

 I bet there are a lot of cool new discovery tools we could think about that
 way make both digital collections AND materials stored off-site accessible
 to our patrons.
 As for the use case that Tim pointed out, it seems like those materials
 should have been part of a reference collection of some sort. It goes
 without saying that as libraries contemplate major changes like these, our
 job is to be listening to our patrons so that we can learn what mistakes we
 might have made and remedy them. An interesting idea that has been tossed
 around here is to retain on open browsing shelves the materials most
 recently pulled from the ARS. Perhaps that would need to include materials
 most frequently pulled from the ARS, too.

 -emily

 --

 Date:Fri, 1 Oct 2010 13:46:24 -0400
 From:Dan Scott d...@coffeecode.net
 Subject: Re: CSU library finds 40% of collection hasn't circulated


 On Fri, Oct 1, 2010 at 7:29 AM, Kyle Banerjee baner...@uoregon.edu
 wrote:

  
  We're going to move out the books that are never checked out, the
 ones
  that are never used anymore,
 


 
  I hope they're not relying exclusively on circ transaction data to
 discover
  what is never used. I realize this may sound insane, but a lot of
  materials are actually used *in the library* without being checked out.
 The
  nature of the resource and the people who have a lot to do with this.
 
  Years ago, we did a major weeding and storage project at a place I
 worked at
  did something similar. Just to be safe, we had the shelvers look at our
  proposed list which contained 10's of thousands of items to see if any
 of
  them jumped out as things they recognized as materials that were used.
 While
  most were not, there were certainly a number that were.



 That's not at all insane. In fact, we use our next-generation ILS
 (Evergreen - did y'all catch that valiant attempt to link this thread
 to the supposed topic of the mailing list?) to record in-house uses,
 and when we did our own PR-free move of items from the stuffed
 circulation stacks into storage this summer, we used a combination of
 lack of circulation since 1985 and lack of recorded in-house uses
 since 2003 to determine likely suspects for movement into storage

[CODE4LIB] Innovative's Synergy

2010-06-30 Thread Cindy Harper

Hi All - III is touting their web-services based Synergy product as having
the efficiency of a pre-indexed service and the timeliness of a just-in-time
service.  Does anyone know if the agreements they have made with database
vendors to use these web services preclude an organization developing an
open-source client to take advantage of those web services?  Just curious.
I suppose I should direct my question to EBSCO and Proquest directly.


Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

[CODE4LIB] Google Book Search staff member?

2010-03-01 Thread Cindy Harper

Hi - I wonder - if the Google Book Search staff member who attended C4L10 is
monitoring this list, could he contact me off-list?  I didn't get a chance
to continue the conversation that we almost started while waiting for dinner
transportation Tuesday night, and I wonder what Google thinks of some ideas
I have for using GBS data.  I didn't even get your name!


Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

[CODE4LIB] Female roomate wanted for Code4libcon

2010-01-04 Thread Cindy Harper

Hi - I've booked a room at the marriott.  Would like to share with female
room-mate to cut costs.  Must mention that I snore, alas.

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

[CODE4LIB] Bookmarking web links - authoritativeness or focused searching

2009-09-29 Thread Cindy Harper

I've been thinking about the role of libraries as promoter of authoritative
works - helping to select and sort the plethora of information out there.
And I heard another presentation about social media this morning. So I
though I'd bring up for discussion here some of the ideas I've been mulling
over.

Last week I sent this message to the Suggestions and Ideas forum at
delicious.
http://support.delicious.com/forum/comments.php?DiscussionID=3237page=1#Item_0
The basic idea is to develop a delicious network of librarians. Or a network
of faculty members. Then have one login whose network included those users,
and share that login so that lots of people could share that network.
Delicious responded that we could have a wiki where people posted their
delicious names so that others could add them to their personal networks,
but that doesn't scale up very well.

Or another project I've toyed with, involving focused searching: I started
with Robert Teeter's index to Great Books lists.
http://www.interleaves.org/~rteeter/grtalphaa.htmlhttp://www.interleaves.org/%7Erteeter/grtalphaa.html.
I've almost completed pulling them into a MySQL database so that I could
sort the titles by the number of Great Books lists that mention each title.
Then I thought about how one could do focused searching of the web,
collecting pages with a title containing (best and books) or (great and
books), and screen scraping title lists (you'd have to have some heuristic
method of identifying the data, of course, and I'm aware what problems might
arise there). But my test searches in that idea showed that one runs into a
lot of commercial ephemeral lists and spurious lists. Now, you could rely
on crowd-sourcing to filter out the consensus by ranking by the number of
sites/cites. But I thought you might want to differentiate between the
source - .edus, librarys, etc.

So that led me to speculate about a search engine that ranked just by links
from .edu's, libraries sites, and a librarian-vetted list of .orgs,
scholarly publishers, etc. I think you can limit by .edu in the linked-from
in Google - I haven't tried that much. if anyone here has experience at
using tha technique, I'd like to hear about it. But I'm thinking now about
the possibility of a search engine limited to sites cooperatively vetted by
librarians, that would incorporate ranking by # links. Something more
responsive than cataloging websites in our catalogs.

Is anyone else thinking about these ideas? or do you know of projects that
approach this goal of leveraging librarian's vetting of authoritative
sources?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] indexing pdf files

2009-09-16 Thread Cindy Harper

We're just talking about creating an index, not a separate copy of the
works, right? because I imagine that copyright has a lot to do with why
this type of thing doesn't already exist.

On Wed, Sep 16, 2009 at 3:08 PM, Eric Lease Morgan emor...@nd.edu wrote:

Eric Morgan wrote:

http://infomotions.com/highlights/

Rosalyn Metz wrote:

I have librarians that would kill for this. In fact I was talking to
one about it the other day. She felt there must be a way to handle
active reading and make it portable. This would be great in
conjunction with RefWorks or Zotero or something along those lines.

Yep, when I was creating this application for myself I was wondering what
it would be like if a whole group, say, an academic department, were to
systematically contribute to such a thing? I thought the output would be
pretty exciting.

Mark A. Matienzo wrote:

Have you considered using Solr's ExtractingRequestHandler [1] for the
PDFs? We're using it at NYPL with pretty great success.

[1] http://wiki.apache.org/solr/ExtractingRequestHandler

Nope, never saw that previously. Thanks for the pointer.

Peter Kiraly wrote:

I would like to suggest an API for extracting text (including highlighted
or
annotated ones) from PDF: iText (http://www.lowagie.com/iText/).
This is a Java API (has C# port), and it helped me a lot, when we worked
with extraordinary PDF files.

More tools! Thank you.

danielle plumer wrote:

My (much more primitive) version of the same thing involves reading and
annotating articles using my Tablet PC. Although I do get a variety of
print
publications, I find I don't tend to annotate them as much anymore. I used
to use EndNote to do the metadata, then I switched to Zotero. I hadn't
thought to try to create a full-text search of the articles -- hmm.

Yes, for a growing number of the tools I create I need to be thinking about
Zotero as way of remembering content. Thanks for... reminding me.

Erik Hatcher wrote:

Here's a post on how easy it is to send PDF documents to Solr from Java:

http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/

I'm looking forward to the arrival of my Solr books any day now. After
reading it I hope to have a better handle on the guts of Solr as well as
increase my abilities to do the sorts of things discussed at the URL above.

Thank you, one and all for your replies.

--
Eric Morgan

--
Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] R?

2009-09-10 Thread Cindy Harper

I took some online courses in data mining last year at statistics.com, some
of which featured R.  I was pleased with it, although I haven't tried to
integrate it in any programming project, and I only scratched the surface.
I also would highly recommend the courses at statistics.com.  Now if I could
just work out the data collection to make use of the data mining techniques
on our library data.

On Thu, Sep 10, 2009 at 9:59 AM, Glen Newton - NRC/CNRC CISTI/ICIST Research
glen.new...@nrc-cnrc.gc.ca wrote:

  William == William Denton w...@pobox.com writes:
William Are any of you using R?  http://www.r-project.org/

 I use R for a number of things, including the multidimensional
 scaling (512--2) I do here:

 http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html

 It is fast, backed by the stats braniacs, has a huge number of
 domain-specific modules (biology, genomics, geology, engineering,
 ).

 It is great. Slices bread, juliennes fries, casts my votes, does my
 taxes, feeds my dogs and submits my postings to code4lib.  ;-)

 -glen



  William == William Denton w...@pobox.com writes:

William Are any of you using R?  http://www.r-project.org/

WilliamBlog about R, info viz, etc.:
William http://blog.revolution-computing.com/

William I have something in mind I'm going to try fooling around
William with in R, but I wondered if anyone was using it for
William visualizing searches, usage, networks of information,
William that kind of thing.

William Bill -- William Denton, Toronto : miskatonic.org
William www.frbr.org openfrbr.org

 --

 Glen Newton | glen.new...@nrc-cnrc.gc.ca
 Researcher, Information Science, CISTI Research
  NRC W3C Advisory Committee Representative
 http://tinyurl.com/yvchmu
 tel/tél http://tinyurl.com/yvchmu%0Atel/t%C3%A9l: 613-990-9163 |
 facsimile/télécopieur 613-952-8246
 Canada Institute for Scientific and Technical Information (CISTI)
 National Research Council Canada (NRC)| M-55, 1200 Montreal Road
 http://www.nrc-cnrc.gc.ca/
 Institut canadien de l'information scientifique et technique (ICIST)
 Conseil national de recherches Canada | M-55, 1200 chemin Montréal
 Ottawa, Ontario K1A 0R6
 Government of Canada | Gouvernement du Canada
 --




-- 
Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363

Re: [CODE4LIB] A little Google Book Search project: GoogleBSCites - Ranking by Google Book Search

2009-05-27 Thread Cindy Harper


 I thought someone out there might be interested in a poster session I just
 did at the Innovative Users Group Conference 2009.  I undertook the project
 because i was personally interested in the outcome, and because I look
 forward to the day when these data will be available - from Google, from the
 Internet Archive, from Hathi trust, from .

 It's fraught with problems and both recall and precision errors, but I call
 it an approximation of citation searching for the books in the Colgate
 collection, then ranking them by the number of hits.

 I took about 688,000 monographic records that had both an author and a
 title from the Colgate library catalog, and constructed a search in
 GoogleBookSearch.  Since I wanted to find citations - or other books that
 mentioned the book in question, I didn't restrict by field.

 Title phrase from 245 subfields a  b, up to 10 words long.
 plus:
  first two words in the author (if a personal author)
  author phrase (if a conference author)
  first 6 words in author (if a corporate author)

 Searched these over the course of 3/1/2009 - 4/27/2009 at less than 380
 searches an hour (took 3 machines to get the job done in 6 weeks).
 Screen-scraped Google's reported 1 to 8 of #hits records.

 The results rank these by the # of citations.


 http://lisv06.colgate.edu/GBSCites/default.aspx

 My results omit GovDocs for the time being, since I forgot to download the
 086s into the records - I could add that later.  Those corporate bodies are
 problems in  my search strategy, anyway.  I did include them in the search
 portion of the project.

 I don't know how many users this MySql site will support - it's entirely
 un-stress-tested, but i trust you won't all go searching it at once.

 --
 Cindy Harper, Systems Librarian
 Colgate University Libraries
 char...@colgate.edu
 315-228-7363

42 matches

Mail list logo