Re: [CODE4LIB] NY Times Bookmarklet

2011-05-02 Thread Erin R White/FS/VCU
Good call. I ended up rewriting the bookmarklet to look at NYT's meta 
tags rather than the body text as they are more uniform and usually 
contain the info we need. They have fairly consistent use across the 
archives for the years I quickly tested (2003-present).

The script now looks for a published title in the meta tags, and if it 
doesn't find one, looks for the headline used in the article. I changed 
the date search to a fuzzy match by month and year rather than by exact 
date, since pubdate can be inconsistent with Lexis's.

Highly unscientific and not 100% accurate, so this bookmarklet's 
definitely a bit of a hack, but should work *most* of the time for 
articles that made it into the print version.

https://gist.github.com/944809#file_nyt_lexis_bookmarklet_meta_tags.js

--
Erin White
Web Applications Developer, VCU Libraries
804-827-3552 | erwh...@vcu.edu | http://library.vcu.edu/




From:
Bob Duncan dunc...@lafayette.edu
To:
CODE4LIB@LISTSERV.ND.EDU
Date:
04/29/2011 10:42 AM
Subject:
Re: [CODE4LIB] NY Times Bookmarklet
Sent by:
Code for Libraries CODE4LIB@LISTSERV.ND.EDU



Date:Wed, 27 Apr 2011 09:10:20 -0400
From:Van Mil, James (vanmiljf) vanmi...@ucmail.uc.edu
Subject: NY Times Bookmarklet
. . .
However, every article at the web version of the NY Times that was 
also published in the print version includes a reference to the 
article from the print edition, including date, page number, and 
print version title (information which is all still accessible in 
the page source when the paywall blocks access).


I wish this were true, but unfortunately, it's not.  Not every 
reference to the print version includes the print version 
headline.  In fact, it appears that including the print headline is a 
fairly recent addition to the Times Website.  (Very unscientific 
searching suggests it started within the last few weeks.)  I wonder 
if it might make more sense to grab the author's name and pass that 
with the print pub date to PQ/LexisNexis instead -- most articles 
seem to include a byline.  Or grab the beginning sentence and pass 
that.  (You'd have to get rid of any anchor elements.)  It also 
appears that not every article that's published in print includes a 
reference to the print version in the Web version, but most seem to.

Bob Duncan


~!~!~!~!~!~!~!~!~!~!~!~!~
Robert E. Duncan
Systems Librarian
Lafayette College
Easton, PA  18042
dunc...@lafayette.edu
http://library.lafayette.edu/ 


Re: [CODE4LIB] NY Times Bookmarklet

2011-04-29 Thread Bob Duncan

Date:Wed, 27 Apr 2011 09:10:20 -0400
From:Van Mil, James (vanmiljf) vanmi...@ucmail.uc.edu
Subject: NY Times Bookmarklet
. . .
However, every article at the web version of the NY Times that was 
also published in the print version includes a reference to the 
article from the print edition, including date, page number, and 
print version title (information which is all still accessible in 
the page source when the paywall blocks access).



I wish this were true, but unfortunately, it's not.  Not every 
reference to the print version includes the print version 
headline.  In fact, it appears that including the print headline is a 
fairly recent addition to the Times Website.  (Very unscientific 
searching suggests it started within the last few weeks.)  I wonder 
if it might make more sense to grab the author's name and pass that 
with the print pub date to PQ/LexisNexis instead -- most articles 
seem to include a byline.  Or grab the beginning sentence and pass 
that.  (You'd have to get rid of any anchor elements.)  It also 
appears that not every article that's published in print includes a 
reference to the print version in the Web version, but most seem to.


Bob Duncan


~!~!~!~!~!~!~!~!~!~!~!~!~
Robert E. Duncan
Systems Librarian
Lafayette College
Easton, PA  18042
dunc...@lafayette.edu
http://library.lafayette.edu/ 


[CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread Van Mil, James (vanmiljf)
Hi everyone! (first post!)

We've been getting lots of feedback at my library about the problem with the NY 
Times paywall and the lack of institutional access to their website, but we do 
have a subscription to a Proquest database which includes all current content 
that is included in the print addition.

However, every article at the web version of the NY Times that was also 
published in the print version includes a reference to the article from the 
print edition, including date, page number, and print version title 
(information which is all still accessible in the page source when the paywall 
blocks access). Additionally, the Proquest database has very clear search 
syntax.

So, I wrote a bookmarklet to check whether the article was published in print 
and to open a new browser window to search for the article at Proquest. (I know 
that there are other work-arounds to the paywall, but I'm interested in one 
that our library could ethically promote.)

The code for the bookmarklet is short, so I've included it below. I'd like to 
add the option to search the headline in Google News for any articles that 
aren't available in the print version, and I need to write some title-string 
sanitization to deal with some funky punctuation in the occasional headline. If 
anyone has any other feedback, I'd love to hear it. (And I apologize both for 
the lack of commenting (bookmarklets don't seem to have room for this) and for 
the lack of style (I started learned Javascript yesterday).)

Thanks!
James

James Van Mil
Collections  Electronic Resources Librarian
Electronic Resources Department
University of Cincinnati Libraries
PO Box 210033
Cincinnati OH 45221-0033
Telephone: (513) 556-1410
vanmi...@ucmail.uc.edumailto:vanmi...@ucmail.uc.edu



javascript:
(
function()
{ var source = document.documentElement.innerHTML;
  var regex1 = /A version of this article appeared in print on 
((January|February|March|April|May|June|July|August|September|October|November|December)
 ([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the New 
York edition with the headline:(.*)/g;
  var match = regex1.exec(source);
 if (match)
 {
  var articleDate = new Date(match[2] + ' ' + match[3] + ', ' + 
match[4]);
  var articleYear = articleDate.getFullYear();
  var articleMonth = articleDate.getMonth()+1;
  var articleDay = articleDate.getDate();
  var regex2 = /([A-Z]+)(\d+)/g;
  var pageMatch = regex2.exec(match[6]);
  
window.open('https://proxy.libraries.uc.edu/login?url=http://proquest.umi.com/pqdweb?RQT=305SQ=issn%2803624331%29%20and%20ti%28'
 + match[7] + '%29%20and%20pdn%28' + articleMonth + '%2F' + articleDay + '%2F' 
+ articleYear + '%29%20and%20startpage%28' + pageMatch[1] + '.' + pageMatch[2] 
+ '%29');
 }
 else
 {
  alert(This article hasn't been published in the print version of 
the NY Times and isn't accessible through the UC Libraries.);
 }
}

)
();


Re: [CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread Jonathan Rochkind

This is a great idea, thanks for sharing.

On 4/27/2011 9:10 AM, Van Mil, James (vanmiljf) wrote:

Hi everyone! (first post!)

We've been getting lots of feedback at my library about the problem with the NY 
Times paywall and the lack of institutional access to their website, but we do 
have a subscription to a Proquest database which includes all current content 
that is included in the print addition.

However, every article at the web version of the NY Times that was also 
published in the print version includes a reference to the article from the 
print edition, including date, page number, and print version title 
(information which is all still accessible in the page source when the paywall 
blocks access). Additionally, the Proquest database has very clear search 
syntax.

So, I wrote a bookmarklet to check whether the article was published in print 
and to open a new browser window to search for the article at Proquest. (I know 
that there are other work-arounds to the paywall, but I'm interested in one 
that our library could ethically promote.)

The code for the bookmarklet is short, so I've included it below. I'd like to 
add the option to search the headline in Google News for any articles that 
aren't available in the print version, and I need to write some title-string 
sanitization to deal with some funky punctuation in the occasional headline. If 
anyone has any other feedback, I'd love to hear it. (And I apologize both for 
the lack of commenting (bookmarklets don't seem to have room for this) and for 
the lack of style (I started learned Javascript yesterday).)

Thanks!
James

James Van Mil
Collections  Electronic Resources Librarian
Electronic Resources Department
University of Cincinnati Libraries
PO Box 210033
Cincinnati OH 45221-0033
Telephone: (513) 556-1410
vanmi...@ucmail.uc.edumailto:vanmi...@ucmail.uc.edu



javascript:
(
function()
 { var source = document.documentElement.innerHTML;
   var regex1 = /A version of this article appeared in print on 
((January|February|March|April|May|June|July|August|September|October|November|December)
 ([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the New York 
edition with the headline:(.*)/g;
   var match = regex1.exec(source);
  if (match)
  {
   var articleDate = new Date(match[2] + ' ' + match[3] + ', ' + 
match[4]);
   var articleYear = articleDate.getFullYear();
   var articleMonth = articleDate.getMonth()+1;
   var articleDay = articleDate.getDate();
   var regex2 = /([A-Z]+)(\d+)/g;
   var pageMatch = regex2.exec(match[6]);
   
window.open('https://proxy.libraries.uc.edu/login?url=http://proquest.umi.com/pqdweb?RQT=305SQ=issn%2803624331%29%20and%20ti%28'
 + match[7] + '%29%20and%20pdn%28' + articleMonth + '%2F' + articleDay + '%2F' + 
articleYear + '%29%20and%20startpage%28' + pageMatch[1] + '.' + pageMatch[2] + 
'%29');
  }
  else
  {
   alert(This article hasn't been published in the print version of the 
NY Times and isn't accessible through the UC Libraries.);
  }
 }

)
();



Re: [CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread Erin R White/FS/VCU
James, AWESOME idea. I'm excited to share with my library.

For those of you who are getting NYT through LexisNexis I've modified the 
code below - just throw in your proxy URL and library name.

I also fixed the first regex to work with non-article items as well 
(op-eds, etc).

javascript:
(
function()
{ var source = document.documentElement.innerHTML;
  var regex1 = /appeared in print on 
((January|February|March|April|May|June|July|August|September|October|November|December)
 
([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the 
New York edition with the headline:(.*)/g;
  var match = regex1.exec(source);
 if (match)
 {
  var articleDate = new Date(match[2] + ' ' + match[3] + ', ' 
+ match[4]);
  var articleYear = articleDate.getFullYear();
  var articleMonth = articleDate.getMonth()+1;
  var articleDay = articleDate.getDate();
  var regex2 = /([A-Z]+)(\d+)/g;
  var pageMatch = regex2.exec(match[6]);
  var articleURL = 
'http://www.lexisnexis.com/us/lnacademic/api/version1/sr?shr=tcsi=6742sr=HLEAD%28'
 
+ match[7] + '%29+AND+DATE+IS+' + articleMonth + '%2F' + articleDay + 
'%2F' + articleYear;
  window.open('http://proxy.library.vcu.edu/login?url=' + 
articleURL);
 }
 else
 {
  alert(This article hasn't been published in the print 
version of the NY Times and isn't accessible through VCU Libraries.);
 }
}

)
();

--
Erin White
Web Applications Developer, VCU Libraries
804-827-3552 | erwh...@vcu.edu | http://library.vcu.edu/




From:
Jonathan Rochkind rochk...@jhu.edu
To:
CODE4LIB@LISTSERV.ND.EDU
Date:
04/27/2011 10:13 AM
Subject:
Re: [CODE4LIB] NY Times Bookmarklet
Sent by:
Code for Libraries CODE4LIB@LISTSERV.ND.EDU



This is a great idea, thanks for sharing.

On 4/27/2011 9:10 AM, Van Mil, James (vanmiljf) wrote:
 Hi everyone! (first post!)

 We've been getting lots of feedback at my library about the problem with 
the NY Times paywall and the lack of institutional access to their 
website, but we do have a subscription to a Proquest database which 
includes all current content that is included in the print addition.

 However, every article at the web version of the NY Times that was also 
published in the print version includes a reference to the article from 
the print edition, including date, page number, and print version title 
(information which is all still accessible in the page source when the 
paywall blocks access). Additionally, the Proquest database has very clear 
search syntax.

 So, I wrote a bookmarklet to check whether the article was published in 
print and to open a new browser window to search for the article at 
Proquest. (I know that there are other work-arounds to the paywall, but 
I'm interested in one that our library could ethically promote.)

 The code for the bookmarklet is short, so I've included it below. I'd 
like to add the option to search the headline in Google News for any 
articles that aren't available in the print version, and I need to write 
some title-string sanitization to deal with some funky punctuation in the 
occasional headline. If anyone has any other feedback, I'd love to hear 
it. (And I apologize both for the lack of commenting (bookmarklets don't 
seem to have room for this) and for the lack of style (I started learned 
Javascript yesterday).)

 Thanks!
 James

 James Van Mil
 Collections  Electronic Resources Librarian
 Electronic Resources Department
 University of Cincinnati Libraries
 PO Box 210033
 Cincinnati OH 45221-0033
 Telephone: (513) 556-1410
 vanmi...@ucmail.uc.edumailto:vanmi...@ucmail.uc.edu



 javascript:
 (
 function()
  { var source = document.documentElement.innerHTML;
var regex1 = /A version of this article appeared in print on 
((January|February|March|April|May|June|July|August|September|October|November|December)
 
([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the 
New York edition with the headline:(.*)/g;
var match = regex1.exec(source);
   if (match)
   {
var articleDate = new Date(match[2] + ' ' + match[3] + ', 
' + match[4]);
var articleYear = articleDate.getFullYear();
var articleMonth = articleDate.getMonth()+1;
var articleDay = articleDate.getDate();
var regex2 = /([A-Z]+)(\d+)/g;
var pageMatch = regex2.exec(match[6]);
 
window.open('https://proxy.libraries.uc.edu/login?url=http://proquest.umi.com/pqdweb?RQT=305SQ=issn%2803624331%29%20and%20ti%28'
 
+ match[7] + '%29%20and%20pdn%28' + articleMonth + '%2F' + articleDay + 
'%2F' + articleYear + '%29%20and%20startpage%28' + pageMatch[1] + '.' + 
pageMatch[2] + '%29');
   }
   else
   {
alert(This article hasn't been published in the print 
version of the NY Times and isn't accessible

Re: [CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread Andreas Orphanides
 
This is a kind of naive approach, and my lack of actually thinking through the 
matter is entirely a result of not having had to deal with it, but:
 
As I understand it, the NYT paywall doesn't count referrals from blog posts, 
FriendFace, the Twitchers and the like. I'm not sure how it figures this out 
(referrer possibly? but see sub). I do know that frequently I'll see a variety 
of different query parameters at the end of a nytimes.com URL when I get to one 
of their pages from another source.
 
Now, I've heard [1] that the NYT paywall isn't particularly sophisticated -- 
e.g., it doesn't work if Javascript is off. Also, bearing in mind that NO ONE 
would EVER make a practice of such a nefarious activity as trying to avoid a 
paywall through trickery, has anyone experimented with the effect of query 
parameters on said paywall?
 
-Dre.
 
[1] Rumors, rumors, rumors Maybe somewhere on daring fireball?
 

 Erin R White/FS/VCU erwh...@vcu.edu 4/27/2011 11:45 AM 
James, AWESOME idea. I'm excited to share with my library.

For those of you who are getting NYT through LexisNexis I've modified the 
code below - just throw in your proxy URL and library name.

I also fixed the first regex to work with non-article items as well 
(op-eds, etc).

javascript:
(
function()
{ var source = document.documentElement.innerHTML;
  var regex1 = /appeared in print on 
((January|February|March|April|May|June|July|August|September|October|November|December)
 
([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the 
New York edition with the headline:(.*)/g;
  var match = regex1.exec(source);
 if (match)
 {
  var articleDate = new Date(match[2] + ' ' + match[3] + ', ' 
+ match[4]);
  var articleYear = articleDate.getFullYear();
  var articleMonth = articleDate.getMonth()+1;
  var articleDay = articleDate.getDate();
  var regex2 = /([A-Z]+)(\d+)/g;
  var pageMatch = regex2.exec(match[6]);
  var articleURL = 
'http://www.lexisnexis.com/us/lnacademic/api/version1/sr?shr=tcsi=6742sr=HLEAD%28'
 
+ match[7] + '%29+AND+DATE+IS+' + articleMonth + '%2F' + articleDay + 
'%2F' + articleYear;
  window.open('http://proxy.library.vcu.edu/login?url=' + 
articleURL);
 }
 else
 {
  alert(This article hasn't been published in the print 
version of the NY Times and isn't accessible through VCU Libraries.);
 }
}

)
();

--
Erin White
Web Applications Developer, VCU Libraries
804-827-3552 | erwh...@vcu.edu | http://library.vcu.edu/ 




From:
Jonathan Rochkind rochk...@jhu.edu
To:
CODE4LIB@LISTSERV.ND.EDU 
Date:
04/27/2011 10:13 AM
Subject:
Re: [CODE4LIB] NY Times Bookmarklet
Sent by:
Code for Libraries CODE4LIB@LISTSERV.ND.EDU



This is a great idea, thanks for sharing.

On 4/27/2011 9:10 AM, Van Mil, James (vanmiljf) wrote:
 Hi everyone! (first post!)

 We've been getting lots of feedback at my library about the problem with 
the NY Times paywall and the lack of institutional access to their 
website, but we do have a subscription to a Proquest database which 
includes all current content that is included in the print addition.

 However, every article at the web version of the NY Times that was also 
published in the print version includes a reference to the article from 
the print edition, including date, page number, and print version title 
(information which is all still accessible in the page source when the 
paywall blocks access). Additionally, the Proquest database has very clear 
search syntax.

 So, I wrote a bookmarklet to check whether the article was published in 
print and to open a new browser window to search for the article at 
Proquest. (I know that there are other work-arounds to the paywall, but 
I'm interested in one that our library could ethically promote.)

 The code for the bookmarklet is short, so I've included it below. I'd 
like to add the option to search the headline in Google News for any 
articles that aren't available in the print version, and I need to write 
some title-string sanitization to deal with some funky punctuation in the 
occasional headline. If anyone has any other feedback, I'd love to hear 
it. (And I apologize both for the lack of commenting (bookmarklets don't 
seem to have room for this) and for the lack of style (I started learned 
Javascript yesterday).)

 Thanks!
 James

 James Van Mil
 Collections  Electronic Resources Librarian
 Electronic Resources Department
 University of Cincinnati Libraries
 PO Box 210033
 Cincinnati OH 45221-0033
 Telephone: (513) 556-1410
 vanmi...@ucmail.uc.edumailto:vanmi...@ucmail.uc.edu



 javascript:
 (
 function()
  { var source = document.documentElement.innerHTML;
var regex1 = /A version of this article appeared in print on 
((January|February|March|April|May|June|July|August|September|October|November|December)
 
([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0

Re: [CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread Patrick Berry
I've collected these two versions in a public gist for easier hacking.

https://gist.github.com/944541

Pat


Re: [CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread Jonathan Rochkind
Sure, I've experimented myself with getting around the paywall's 
restrictions, it's not hard.


It's not something I would suggest my organization publically (or even 
privately, really) recommend to users or instruct users in how to do, 
however.


There's a role for libraries in this stuff, but I think it's probably 
NOT instructing our users in how to subvert the nytimes Terms of 
Service.  On the other hand, showing our users how to access the same 
nytimes article through a source the library pays for on their behalf, 
or from some other free source, like James' idea -- yeah, that's a great 
idea.


On 4/27/2011 12:01 PM, Andreas Orphanides wrote:


This is a kind of naive approach, and my lack of actually thinking through the 
matter is entirely a result of not having had to deal with it, but:

As I understand it, the NYT paywall doesn't count referrals from blog posts, 
FriendFace, the Twitchers and the like. I'm not sure how it figures this out 
(referrer possibly? but see sub). I do know that frequently I'll see a variety 
of different query parameters at the end of a nytimes.com URL when I get to one 
of their pages from another source.

Now, I've heard [1] that the NYT paywall isn't particularly sophisticated -- 
e.g., it doesn't work if Javascript is off. Also, bearing in mind that NO ONE 
would EVER make a practice of such a nefarious activity as trying to avoid a 
paywall through trickery, has anyone experimented with the effect of query 
parameters on said paywall?

-Dre.

[1] Rumors, rumors, rumors Maybe somewhere on daring fireball?



Erin R White/FS/VCUerwh...@vcu.edu  4/27/2011 11:45 AM

James, AWESOME idea. I'm excited to share with my library.

For those of you who are getting NYT through LexisNexis I've modified the
code below - just throw in your proxy URL and library name.

I also fixed the first regex to work with non-article items as well
(op-eds, etc).

javascript:
(
function()
 { var source = document.documentElement.innerHTML;
   var regex1 = /appeared in print on
((January|February|March|April|May|June|July|August|September|October|November|December)
([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the
New York edition with the headline:(.*)/g;
   var match = regex1.exec(source);
  if (match)
  {
   var articleDate = new Date(match[2] + ' ' + match[3] + ', '
+ match[4]);
   var articleYear = articleDate.getFullYear();
   var articleMonth = articleDate.getMonth()+1;
   var articleDay = articleDate.getDate();
   var regex2 = /([A-Z]+)(\d+)/g;
   var pageMatch = regex2.exec(match[6]);
   var articleURL =
'http://www.lexisnexis.com/us/lnacademic/api/version1/sr?shr=tcsi=6742sr=HLEAD%28'
+ match[7] + '%29+AND+DATE+IS+' + articleMonth + '%2F' + articleDay +
'%2F' + articleYear;
   window.open('http://proxy.library.vcu.edu/login?url=' +
articleURL);
  }
  else
  {
   alert(This article hasn't been published in the print
version of the NY Times and isn't accessible through VCU Libraries.);
  }
 }

)
();

--
Erin White
Web Applications Developer, VCU Libraries
804-827-3552 | erwh...@vcu.edu | http://library.vcu.edu/




From:
Jonathan Rochkindrochk...@jhu.edu
To:
CODE4LIB@LISTSERV.ND.EDU
Date:
04/27/2011 10:13 AM
Subject:
Re: [CODE4LIB] NY Times Bookmarklet
Sent by:
Code for LibrariesCODE4LIB@LISTSERV.ND.EDU



This is a great idea, thanks for sharing.

On 4/27/2011 9:10 AM, Van Mil, James (vanmiljf) wrote:

Hi everyone! (first post!)

We've been getting lots of feedback at my library about the problem with

the NY Times paywall and the lack of institutional access to their
website, but we do have a subscription to a Proquest database which
includes all current content that is included in the print addition.

However, every article at the web version of the NY Times that was also

published in the print version includes a reference to the article from
the print edition, including date, page number, and print version title
(information which is all still accessible in the page source when the
paywall blocks access). Additionally, the Proquest database has very clear
search syntax.

So, I wrote a bookmarklet to check whether the article was published in

print and to open a new browser window to search for the article at
Proquest. (I know that there are other work-arounds to the paywall, but
I'm interested in one that our library could ethically promote.)

The code for the bookmarklet is short, so I've included it below. I'd

like to add the option to search the headline in Google News for any
articles that aren't available in the print version, and I need to write
some title-string sanitization to deal with some funky punctuation in the
occasional headline. If anyone has any other feedback, I'd love to hear
it. (And I apologize both for the lack of commenting (bookmarklets don't
seem to have

Re: [CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread Andreas Orphanides
This is more or less what I expected. To be sure, I don't think it would be 
reasonable for an organization to put into practice any real paywall-dodging 
code, as opposed to redirecting the users to a legit alternative source. Using 
legitimate resources that we have access to is definitely the way for libraries 
to go, so kudos to the implementers for their awesomeness.

It's also surprising and interesting to see that the paywall, despite its 
technological shortcomings, cost the NYT something in the neighborhood of $40 
million to implement. [1] (Also, I guess if I'd read the comments in this blog 
post, it would have answered my earlier question)

-Dre.

[1] 
http://blogs.law.harvard.edu/philg/2011/03/28/how-did-the-new-york-times-manage-to-spend-40-million-on-its-pay-wall/



 Jonathan Rochkind  04/27/11 12:31 PM 
Sure, I've experimented myself with getting around the paywall's 
restrictions, it's not hard.

It's not something I would suggest my organization publically (or even 
privately, really) recommend to users or instruct users in how to do, 
however.

There's a role for libraries in this stuff, but I think it's probably 
NOT instructing our users in how to subvert the nytimes Terms of 
Service.  On the other hand, showing our users how to access the same 
nytimes article through a source the library pays for on their behalf, 
or from some other free source, like James' idea -- yeah, that's a great 
idea.

On 4/27/2011 12:01 PM, Andreas Orphanides wrote:

 This is a kind of naive approach, and my lack of actually thinking through 
 the matter is entirely a result of not having had to deal with it, but:

 As I understand it, the NYT paywall doesn't count referrals from blog posts, 
 FriendFace, the Twitchers and the like. I'm not sure how it figures this out 
 (referrer possibly? but see sub). I do know that frequently I'll see a 
 variety of different query parameters at the end of a nytimes.com URL when I 
 get to one of their pages from another source.

 Now, I've heard [1] that the NYT paywall isn't particularly sophisticated -- 
 e.g., it doesn't work if Javascript is off. Also, bearing in mind that NO ONE 
 would EVER make a practice of such a nefarious activity as trying to avoid a 
 paywall through trickery, has anyone experimented with the effect of query 
 parameters on said paywall?

 -Dre.

 [1] Rumors, rumors, rumors Maybe somewhere on daring fireball?


 Erin R White/FS/VCU  4/27/2011 11:45 AM
 James, AWESOME idea. I'm excited to share with my library.

 For those of you who are getting NYT through LexisNexis I've modified the
 code below - just throw in your proxy URL and library name.

 I also fixed the first regex to work with non-article items as well
 (op-eds, etc).

 javascript:
 (
 function()
  { var source = document.documentElement.innerHTML;
var regex1 = /appeared in print on
 ((January|February|March|April|May|June|July|August|September|October|November|December)
 ([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the
 New York edition with the headline:(.*)var match = 
 regex1.exec(source);
   if (match)
   {
var articleDate = new Date(match[2] + ' ' + match[3] + ', '
 + match[4]);
var articleYear = articleDate.getFullYear();
var articleMonth = articleDate.getMonth()+1;
var articleDay = articleDate.getDate();
var regex2 = /([A-Z]+)(\d+)/g;
var pageMatch = regex2.exec(match[6]);
var articleURL =
 'http://www.lexisnexis.com/us/lnacademic/api/version1/sr?shr=tcsi=6742sr=HLEAD%28'
 + match[7] + '%29+AND+DATE+IS+' + articleMonth + '%2F' + articleDay +
 '%2F' + articleYear;
window.open('http://proxy.library.vcu.edu/login?url=' +
 articleURL);
   }
   else
   {
alert(This article hasn't been published in the print
 version of the NY Times and isn't accessible through VCU Libraries.);
   }
  }

 )
 ();

 --
 Erin White
 Web Applications Developer, VCU Libraries
 804-827-3552 | erwh...@vcu.edu | http://library.vcu.edu/




 From:
 Jonathan Rochkind
 To:
 CODE4LIB@LISTSERV.ND.EDU
 Date:
 04/27/2011 10:13 AM
 Subject:
 Re: [CODE4LIB] NY Times Bookmarklet
 Sent by:
 Code for Libraries



 This is a great idea, thanks for sharing.

 On 4/27/2011 9:10 AM, Van Mil, James (vanmiljf) wrote:
 Hi everyone! (first post!)

 We've been getting lots of feedback at my library about the problem with
 the NY Times paywall and the lack of institutional access to their
 website, but we do have a subscription to a Proquest database which
 includes all current content that is included in the print addition.
 However, every article at the web version of the NY Times that was also
 published in the print version includes a reference to the article from
 the print edition, including date, page number, and print version title
 (information which is all still

Re: [CODE4LIB] NY Times Bookmarklet

2011-04-27 Thread McMillin, Paul
A bookmarklet like NYClean works only because the NYT has already sent the user 
the full text.  The bookmarklet changes the user's display so that the full 
text (which is already 'there') can be viewed.  Isn't this quite different than 
employing a script that 'reaches out' and grabs content from a 
content-provider's site?  And isn't it different than 
sharing/distributing/reproducing the content?  And aren't these differences 
ethically significant?

The ethics of individual use, however, may be significantly different than the 
ethics of what a library chooses to tell its users.  But, what if a library 
mentions on its website that you will not encounter the paywall if you have 
javascript disabled, or if you use Chrome incognito, or private browsing, or 
Tor?  I can't see any ethical problem with mentioning those possibilities.  So, 
what if the library also mentions that you will not encounter the paywall if 
you use the code/bookmarklet found at xxx.com?

I'm still thinking this through, but for now, I'd call it a gray zone.  Here 
are a few discussions (from different perspectives) on the paywall and ethics:

http://tunedin.blogs.time.com/2011/03/28/the-ny-times-paywall-goes-up-when-is-it-immoral-to-go-around-it/
 
http://tunedin.blogs.time.com/2011/03/28/the-ny-times-paywall-goes-up-when-is-it-immoral-to-go-around-it/
 

http://www.niemanlab.org/2011/03/so-then-if-you-jump-the-new-york-times-paywall-are-you-stealing/
 
http://www.niemanlab.org/2011/03/so-then-if-you-jump-the-new-york-times-paywall-are-you-stealing/
 

http://www.nytimes.com/2011/04/24/magazine/mag-24Ethicist-t.html 
http://www.nytimes.com/2011/04/24/magazine/mag-24Ethicist-t.html 

Regardless of the resolution of the ethics question, many thanks to James for 
the code linking the Times to a subscription database.  My library has not, 
yet, chosen to mention the workarounds on our website, but we will definitely 
implement and promote the code that links to our subscription access.  In the 
meantime, we have had a long talk with the Times about offering institutional 
subscription access to nytimes.com as soon as possible, and also about 
providing free  access to the site for College Readership campuses (as part of 
that program, my college pays for 200 print copies of the Times each weekday, 
and then distributes those copies for free on campus).  We have also stressed 
that the paywall is a significant barrier to curricular use of the Times.

Paul



From: Code for Libraries on behalf of Jonathan Rochkind
Sent: Wed 4/27/2011 9:29 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] NY Times Bookmarklet



Sure, I've experimented myself with getting around the paywall's
restrictions, it's not hard.

It's not something I would suggest my organization publically (or even
privately, really) recommend to users or instruct users in how to do,
however.

There's a role for libraries in this stuff, but I think it's probably
NOT instructing our users in how to subvert the nytimes Terms of
Service.  On the other hand, showing our users how to access the same
nytimes article through a source the library pays for on their behalf,
or from some other free source, like James' idea -- yeah, that's a great
idea.

On 4/27/2011 12:01 PM, Andreas Orphanides wrote:

 This is a kind of naive approach, and my lack of actually thinking through 
 the matter is entirely a result of not having had to deal with it, but:

 As I understand it, the NYT paywall doesn't count referrals from blog posts, 
 FriendFace, the Twitchers and the like. I'm not sure how it figures this out 
 (referrer possibly? but see sub). I do know that frequently I'll see a 
 variety of different query parameters at the end of a nytimes.com URL when I 
 get to one of their pages from another source.

 Now, I've heard [1] that the NYT paywall isn't particularly sophisticated -- 
 e.g., it doesn't work if Javascript is off. Also, bearing in mind that NO ONE 
 would EVER make a practice of such a nefarious activity as trying to avoid a 
 paywall through trickery, has anyone experimented with the effect of query 
 parameters on said paywall?

 -Dre.

 [1] Rumors, rumors, rumors Maybe somewhere on daring fireball?


 Erin R White/FS/VCUerwh...@vcu.edu  4/27/2011 11:45 AM
 James, AWESOME idea. I'm excited to share with my library.

 For those of you who are getting NYT through LexisNexis I've modified the
 code below - just throw in your proxy URL and library name.

 I also fixed the first regex to work with non-article items as well
 (op-eds, etc).

 javascript:
 (
 function()
  { var source = document.documentElement.innerHTML;
var regex1 = /appeared in print on
 ((January|February|March|April|May|June|July|August|September|October|November|December)
 ([1-2][0-9]|3[0-1]|0?[1-9]), ((19|20)[0-9][0-9])), on page (\w+) of the
 New York edition with the headline:(.*)/g;
var match = regex1.exec(source);
   if (match