Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Eric Hellman
For example, if you don't want to rely on dx.doi.org as your gateway to the 
handle system for doi resolution, it would be quite easy for me to deploy my 
own gateway at dx.hellman.net. I might want to do this if a were an 
organization paranoid about security and didn't want to disclose to anybody 
what doi's my organization was resolving. Or, I might want to directly access 
metadata in the handle system that doesn't go through the http gateways, to 
provide a service other than resolution.

Does this answer your question, Ross?



On Nov 20, 2009, at 2:31 PM, Ross Singer wrote:

 On Fri, Nov 20, 2009 at 2:23 PM, Eric Hellman e...@hellman.net wrote:
 Having incorporated the handle client software into my own stuff rather 
 easily, I'm pretty sure that's not true.
 
 Fair enough.  The technology is binding independent.
 
 So you are using and sharing handles using some protocol other than HTTP?
 
 I'm more interested in the sharing part of that question.  What is the
 format of the handle identifier in this context?  What advantage does
 it bring over HTTP?
 
 -Ross.

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

e...@hellman.net 
http://go-to-hellman.blogspot.com/


[CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Ken Irwin
Hi all,

I'm moving to a new web server and struggling to get it configured properly. 
The problem of the moment: having a Perl CGI script call another web page in 
the background and make decisions based on its content. On the old server I 
used an antique Perl script called hcat (from the Pelican 
bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried curl 
and LWP::Simple.

In all three cases, I get the same behavior: it works just fine on the command 
line, but when called by the web server through a CGI script, the LWP (or other 
socket connection) gets no results. It sounds like a permissions thing, but I 
don't know what kind of permissions setting to tinker with. In the test script 
below, my command line outputs:

Content-type: text/plain
Getting URL: http://www.npr.org
885 lines

Whereas the web output just says Getting URL: http://www.npr.org; - and 
doesn't even get to the Couldn't get error message.

Any clue how I can make use of a web page's contents from w/in a CGI script? 
(The actual application has to do with exporting data from our catalog, but I 
need to work out the basic mechanism first.)

Here's the script I'm using.

#!/bin/perl
use LWP::Simple;
print Content-type: text/plain\n\n;
my $url = http://www.npr.org;;
print Getting URL: $url\n;
my $content = get $url;
die Couldn't get $url unless defined $content;
@lines = split (/\n/, $content);
foreach (@lines) { $i++; }
print \n\n$i lines\n\n;

Any ideas?

Thanks
Ken


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Vishwam Annam

Ken,

The difference is when you run through command script you are executing 
the file as /owner/ and as /Other/ when you access it through the 
browser. Looking at the error message you sent, I believe it might not 
be executing the complete script. Try setting permissions as 707 or 777 
to start with. You may have to create a temporary directory to test with.


Let me know if you have any questions,

Vishwam
Vishwam Annam
Wright State University Libraries
120 Paul Laurence Dunbar Library
3640 Colonel Glenn Hwy.
Dayton, OH 45435
Office: 937-775-3262
FAX 937-775-2356


Ken Irwin wrote:

Hi all,

I'm moving to a new web server and struggling to get it configured properly. The problem of the 
moment: having a Perl CGI script call another web page in the background and make decisions 
based on its content. On the old server I used an antique Perl script called hcat 
(from the Pelican bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried 
curl and LWP::Simple.

In all three cases, I get the same behavior: it works just fine on the command 
line, but when called by the web server through a CGI script, the LWP (or other 
socket connection) gets no results. It sounds like a permissions thing, but I 
don't know what kind of permissions setting to tinker with. In the test script 
below, my command line outputs:

Content-type: text/plain
Getting URL: http://www.npr.org
885 lines

Whereas the web output just says Getting URL: http://www.npr.org; - and doesn't even get 
to the Couldn't get error message.

Any clue how I can make use of a web page's contents from w/in a CGI script? 
(The actual application has to do with exporting data from our catalog, but I 
need to work out the basic mechanism first.)

Here's the script I'm using.

#!/bin/perl
use LWP::Simple;
print Content-type: text/plain\n\n;
my $url = http://www.npr.org;;
print Getting URL: $url\n;
my $content = get $url;
die Couldn't get $url unless defined $content;
@lines = split (/\n/, $content);
foreach (@lines) { $i++; }
print \n\n$i lines\n\n;

Any ideas?

Thanks
Ken
  


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Matt Jones
Hi Ken,

This may be obvious, but when running from the command line, stdout and
stderr are often interleaved together, but on the web server you see stdout
in the browser and stderr in the web server error log.  Your script is
probably exiting with an error either at the 'get' line (line 6) or at the
'die' line (line 7), which is what 'die' does -- terminate your script.
Have you checked your web server error log to see what the error is on your
'get' call?

Matt

On Mon, Nov 23, 2009 at 7:17 AM, Ken Irwin kir...@wittenberg.edu wrote:

 Hi all,

 I'm moving to a new web server and struggling to get it configured
 properly. The problem of the moment: having a Perl CGI script call another
 web page in the background and make decisions based on its content. On the
 old server I used an antique Perl script called hcat (from the Pelican
 bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried
 curl and LWP::Simple.

 In all three cases, I get the same behavior: it works just fine on the
 command line, but when called by the web server through a CGI script, the
 LWP (or other socket connection) gets no results. It sounds like a
 permissions thing, but I don't know what kind of permissions setting to
 tinker with. In the test script below, my command line outputs:

 Content-type: text/plain
 Getting URL: http://www.npr.org
 885 lines

 Whereas the web output just says Getting URL: http://www.npr.org; - and
 doesn't even get to the Couldn't get error message.

 Any clue how I can make use of a web page's contents from w/in a CGI
 script? (The actual application has to do with exporting data from our
 catalog, but I need to work out the basic mechanism first.)

 Here's the script I'm using.

 #!/bin/perl
 use LWP::Simple;
 print Content-type: text/plain\n\n;
 my $url = http://www.npr.org;;
 print Getting URL: $url\n;
 my $content = get $url;
 die Couldn't get $url unless defined $content;
 @lines = split (/\n/, $content);
 foreach (@lines) { $i++; }
 print \n\n$i lines\n\n;

 Any ideas?

 Thanks
 Ken



Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Ross Singer
On Mon, Nov 23, 2009 at 1:07 PM, Eric Hellman e...@hellman.net wrote:

 Does this answer your question, Ross?

Yes, sort of.  My question was not so much if you can resolve handles
via bindings other than HTTP (since that's one of the selling points
of handles) as it was do people actually use this in the real world?

Of course, it may be impossible to answer that question since, by your
example, such people may not actually be letting anybody know that
they're doing that (although you would probably be somebody with
insider knowledge on this topic).

Also, with your use cases, would these services be impossible if the
only binding was HTTP?  Presumably dx.hellman.net would need to
harvest its metadata from somewhere, which seems like it would leave a
footprint.  It also needs some mechanism to stay in sync with the
master index.  Your non-resolution service also seems to be looking
these things up in realtime.  Would a RESTful or SOAP API (*shudder*)
not accomplish the same goal?

Really, though, the binding argument here is less the issue here than
if you believe http URIs are valid identifiers or not since there's no
reason a URI couldn't be dereferenced via other bindings, either.

-Ross.


Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Roy Tennant
But minting DOIs requires a Registration Agency which as far as I understand
it, requires $1,000 and approval from the International DOI Federation.[1]
Roy

[1] http://www.doi.org/handbook_2000/governance.html#7.2.2


On 11/23/09 11/23/09 € 10:07 AM, Eric Hellman e...@hellman.net wrote:

 For example, if you don't want to rely on dx.doi.org as your gateway to the
 handle system for doi resolution, it would be quite easy for me to deploy my
 own gateway at dx.hellman.net. I might want to do this if a were an
 organization paranoid about security and didn't want to disclose to anybody
 what doi's my organization was resolving. Or, I might want to directly access
 metadata in the handle system that doesn't go through the http gateways, to
 provide a service other than resolution.
 
 Does this answer your question, Ross?
 
 
 
 On Nov 20, 2009, at 2:31 PM, Ross Singer wrote:
 
 On Fri, Nov 20, 2009 at 2:23 PM, Eric Hellman e...@hellman.net wrote:
 Having incorporated the handle client software into my own stuff rather
 easily, I'm pretty sure that's not true.
 
 Fair enough.  The technology is binding independent.
 
 So you are using and sharing handles using some protocol other than HTTP?
 
 I'm more interested in the sharing part of that question.  What is the
 format of the handle identifier in this context?  What advantage does
 it bring over HTTP?
 
 -Ross.
 
 Eric Hellman
 President, Gluejar, Inc.
 41 Watchung Plaza, #132
 Montclair, NJ 07042
 USA
 
 e...@hellman.net 
 http://go-to-hellman.blogspot.com/
 


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Roy Tennant
Ken,
I tested your script on my server and it also worked for me on the command
line and failed via my web server. All I did was add /usr to your path to
perl and it worked:

#!/usr/bin/perl

Roy



On 11/23/09 11/23/09 € 8:17 AM, Ken Irwin kir...@wittenberg.edu wrote:

 Hi all,
 
 I'm moving to a new web server and struggling to get it configured properly.
 The problem of the moment: having a Perl CGI script call another web page in
 the background and make decisions based on its content. On the old server I
 used an antique Perl script called hcat (from the Pelican
 bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried curl
 and LWP::Simple.
 
 In all three cases, I get the same behavior: it works just fine on the command
 line, but when called by the web server through a CGI script, the LWP (or
 other socket connection) gets no results. It sounds like a permissions thing,
 but I don't know what kind of permissions setting to tinker with. In the test
 script below, my command line outputs:
 
 Content-type: text/plain
 Getting URL: http://www.npr.org
 885 lines
 
 Whereas the web output just says Getting URL: http://www.npr.org; - and
 doesn't even get to the Couldn't get error message.
 
 Any clue how I can make use of a web page's contents from w/in a CGI script?
 (The actual application has to do with exporting data from our catalog, but I
 need to work out the basic mechanism first.)
 
 Here's the script I'm using.
 
 #!/bin/perl
 use LWP::Simple;
 print Content-type: text/plain\n\n;
 my $url = http://www.npr.org;;
 print Getting URL: $url\n;
 my $content = get $url;
 die Couldn't get $url unless defined $content;
 @lines = split (/\n/, $content);
 foreach (@lines) { $i++; }
 print \n\n$i lines\n\n;
 
 Any ideas?
 
 Thanks
 Ken
 


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Joe Hourcle

On Mon, 23 Nov 2009, Ken Irwin wrote:


Hi all,

I'm moving to a new web server and struggling to get it configured properly. The problem of the 
moment: having a Perl CGI script call another web page in the background and make decisions 
based on its content. On the old server I used an antique Perl script called hcat 
(from the Pelican bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried 
curl and LWP::Simple.

In all three cases, I get the same behavior: it works just fine on the command 
line, but when called by the web server through a CGI script, the LWP (or other 
socket connection) gets no results. It sounds like a permissions thing, but I 
don't know what kind of permissions setting to tinker with. In the test script 
below, my command line outputs:

Content-type: text/plain
Getting URL: http://www.npr.org
885 lines

Whereas the web output just says Getting URL: http://www.npr.org; - and doesn't even get 
to the Couldn't get error message.

Any clue how I can make use of a web page's contents from w/in a CGI script? 
(The actual application has to do with exporting data from our catalog, but I 
need to work out the basic mechanism first.)

Here's the script I'm using.

#!/bin/perl
use LWP::Simple;
print Content-type: text/plain\n\n;
my $url = http://www.npr.org;;
print Getting URL: $url\n;
my $content = get $url;
die Couldn't get $url unless defined $content;
@lines = split (/\n/, $content);
foreach (@lines) { $i++; }
print \n\n$i lines\n\n;

Any ideas?


I'd suggest testing the results of the call, rather than just looking for 
content, as an empty response could be a result of the server you're 
connecting to.  (unlikely in this case, but it happens once in a while, 
particularly if you turn off redirection, or support caching). 
Unfortunately, you might have to use LWP::UserAgent, rather than 
LWP::Simple:


#!/bin/perl --

use strict; use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent-new( timeout = 60 );

my $response = $ua-get('http://www.npr.org/');
if ( $response-is_success() ) {
my $content = $response-decoded_content();
...
} else {
print HTTP Error : ,$response-status_line(),\n;
}

__END__

(and changing the shebang line for my location of perl, your version 
worked via both CGI and command line)



oh ... and you don't need the foreach loop:

my $i = @lines;

-Joe


Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread MJ Suhonos
Hi all, couldn't resist jumping in on this one:

 But appears that the handle system is quite a bit more fleshed out than a 
 simple purl server, it's a distributed protocol-independent network.   The 
 protocol-independent part may or may not be useful, but it certainly seems 
 like it could be, it doens't hurt to provide for it in advance. The 
 distributed part seems pretty cool to me.
 
 So if it's no harder to set up, maintain, and use a handle server than a Purl 
 server (this is a big 'if', I'm not sure if that's the case), and handle can 
 do everything purl can do and quite a bit more (I'm pretty sure that is the 
 case)... why NOT use handle instead of purl? It seems like handle is a more 
 fleshed out, robust, full-featured thing than purl.

I think it's also worth adding that handles (and DOIs) can also be used to 
create PURLs, eg. http://purl.org/handles/10.1074/jbc.M004545200 (which isn't a 
real link) -- in fact, there's no reason why you couldn't use a PURL server as 
a local handle resolver, aside from the fact that it wouldn't be participating 
in the handle network.  See, for example: 
http://www.ukoln.ac.uk/distributed-systems/poi/

One thing PURL has going for it is that it has defined meanings for HTTP 
response codes; these are similar to REST responses though I don't know if 
they're the same; the most recent documentation mentions that PURL servers are 
RESTful but I suspect this is part of the recent re-tooling of PURL.
http://purl.oclc.org/docs/help.html#rest

The only potential advantage of PURLs that I can see is the ability to do 
partial redirects, eg. http://purl.org/redirect/xx -- 
http://some.server/long.path/x -- though one could make the case that this 
might be useful for directing handle requests to the appropriate servers, eg.
http://purl.org/handles/10.123/xx -- http://handleserver1/xx and 
http://purl.org/handles/10.456/xx -- http://doiserver2/xx ...

Overall, I tend to agree that handles seem more flexible -- or at least, less 
tied to URL and HTTP -- than PURLs.  Not having to rely on a specific server 
for resolution is a fairly major bonus (think DNS-style round-robin resolver 
querying for handles; not possible with PURLs).

MJ

PS.  At the risk of reposting potentially old news:  
http://web.mit.edu/handle/www/purl-eval.html


Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Ross Singer
On Mon, Nov 23, 2009 at 2:52 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Well, here's the trick about handles, as I understand it.  A handle, for
 instance, a DOI, is 10.1074/jbc.M004545200.

Well, actually, it could be:
10.1074/jbc.M004545200
doi:10.1074/jbc.M004545200
info:doi/10.1074/jbc.M004545200

etc.  But there's still got to be some mechanism to get from there to:
http://dx.doi.org/10.1074/jbc.M004545200
or
http://dx.hellman.net/10.1074/jbc.M004545200

I don't see why it's any different, fundamentally, than:
http://purl.hellman.net/?purl=http%3A%2F%2Fpurl.org%2FNET%2Fdoi%2F10.1074%2Fjbc.M004545200

besides being prettier.

Anyway, my argument wasn't that Purl was technologically more sound
that handles -- Purl services have a major single-point-of-failure
problem -- it's just that I don't buy the argument that handles are
somehow superior because they aren't limited to HTTP.

What I'm saying is that there plenty of valid reasons to value handles
more than purls (or any other indirection service), but independence
to HTTP isn't one of them.

-Ross.

 While, for DOI handles, normally we resolve that using dx.doi.org, at
 http://dx.doi.org/10.1074/jbc.M004545200, that is not actually a requirement
 of the handle system. You can resolve it through any handle server, over
 HTTP or otherwise. Even if it's still over HTTP, it doesn't have to be at
 dx.doi.org, it can be via any handle resolver.

 For instance, check this out, it works:

 http://hdl.handle.net/10.1074/jbc.M004545200

 Cause the DOI is really just a subset of Handles, any resolver participating
 in the handle network can resolve em.  In Eric's hypothetical use case, that
 could be a local enterprise handle resolver of some kind. (Although I'm not
 totally sure that would keep your usage data private; the documentation I've
 seen compares the handle network to DNS, it's a distributed system, I'm not
 sure in what cases handle resolution requests are sent 'upstream' by the
 handle resolver, and if actual individual lookups are revealed by that or
 not. But in any case, when Ross suggests -- Presumably dx.hellman.net would
 need to harvest its metadata from somewhere, which seems like it would leave
 a footprint. It also needs some mechanism to stay in sync with the master
 index. -- my reading this suggests this is _built into_ the handle
 protocol, it's part of handle from the very start (again, the DNS analogy,
 with the emphasis on the distributed resolution aspect), you don't need to
 invent it yourself. The details of exactly how it works, I don't know enough
 to say.  )

 Now, I'm somewhat new to this stuff too, I don't completely understand how
 it works.  Apparently hdl.handle.net can strikehandle/strike deal with
 any handle globally, while presumably dx.doi.org can only deal with the
 subset of handles that are also DOIs.  And apparently you can have a handle
 resolver that works over something other than HTTP too. (Although Ross
 argues, why would you want to? And I'm inclined to agree).

 But appears that the handle system is quite a bit more fleshed out than a
 simple purl server, it's a distributed protocol-independent network.   The
 protocol-independent part may or may not be useful, but it certainly seems
 like it could be, it doens't hurt to provide for it in advance. The
 distributed part seems pretty cool to me.

 So if it's no harder to set up, maintain, and use a handle server than a
 Purl server (this is a big 'if', I'm not sure if that's the case), and
 handle can do everything purl can do and quite a bit more (I'm pretty sure
 that is the case)... why NOT use handle instead of purl? It seems like
 handle is a more fleshed out, robust, full-featured thing than purl.

 Jonathan




 Presumably dx.hellman.net would need to
 harvest its metadata from somewhere, which seems like it would leave a
 footprint.  It also needs some mechanism to stay in sync with the
 master index.  Your non-resolution service also seems to be looking
 these things up in realtime.  Would a RESTful or SOAP API (*shudder*)
 not accomplish the same goal?

 Really, though, the binding argument here is less the issue here than
 if you believe http URIs are valid identifiers or not since there's no
 reason a URI couldn't be dereferenced via other bindings, either.

 -Ross.





Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Jonathan Rochkind
The actual handle is 10.1074/jbc.M004545200 .  If your software 
wants to get a handle to give it to any handle resolver of it's 
choice, it's going to have to parse the doi: or info: versions to 
get the handle out first.  The info version is a URI that has a DOI 
handle embedded in it.  The doi version is... um, I dunno, just a 
convention, I think, that has a DOI handle embedded in it.


Likewise, if your software had a URI, and was smart enough to know that 
the URI http://dx.doi.org/10.1074/jbc.M004545200; actually had a handle 
embedded in it, it could strip the handle out, and then resolve it 
against some other handle server that participates in the handle 
network, like hdl.handle.net.  But that would be kind of going against 
the principle to treat URI's as opaque identifiers and not parse them 
for internal data.


But me, I end up going against that principle all the time in actual 
practice, actually for scenarios kind of analagous to, but less 
well-defined and spec'd than, getting the actual handle out of the URI 
and resolving it against some other service. For instance, getting an 
OCLCnum out of an http://worldcat.oclc.org/ URI, to resolve against my 
local catalog that knows something about OCLCnums, but doesn't know 
anything about http://worldcat.oclc.org URIs that happen to have an 
OCLCnum embedded in them. Or getting an ASIN out of a 
http://www.amazon.com/ URI, to resolve against Amazon's _own_ web 
services, which ironically know something about ASIN's but don't know 
anything about www.amazon.com URI's that have an ASIN embedded in them.  
Actually quite analagous to getting the actual handle out of an 
http://dx.doi.org or http://hdi.handle.net URI, in order to resolve 
against the resolver of choice.


Jonathan

Ross Singer wrote:

On Mon, Nov 23, 2009 at 2:52 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

  

Well, here's the trick about handles, as I understand it.  A handle, for
instance, a DOI, is 10.1074/jbc.M004545200.



Well, actually, it could be:
10.1074/jbc.M004545200
doi:10.1074/jbc.M004545200
info:doi/10.1074/jbc.M004545200

etc.  But there's still got to be some mechanism to get from there to:
http://dx.doi.org/10.1074/jbc.M004545200
or
http://dx.hellman.net/10.1074/jbc.M004545200

I don't see why it's any different, fundamentally, than:
http://purl.hellman.net/?purl=http%3A%2F%2Fpurl.org%2FNET%2Fdoi%2F10.1074%2Fjbc.M004545200

besides being prettier.

Anyway, my argument wasn't that Purl was technologically more sound
that handles -- Purl services have a major single-point-of-failure
problem -- it's just that I don't buy the argument that handles are
somehow superior because they aren't limited to HTTP.

What I'm saying is that there plenty of valid reasons to value handles
more than purls (or any other indirection service), but independence
to HTTP isn't one of them.

-Ross.

  

While, for DOI handles, normally we resolve that using dx.doi.org, at
http://dx.doi.org/10.1074/jbc.M004545200, that is not actually a requirement
of the handle system. You can resolve it through any handle server, over
HTTP or otherwise. Even if it's still over HTTP, it doesn't have to be at
dx.doi.org, it can be via any handle resolver.

For instance, check this out, it works:

http://hdl.handle.net/10.1074/jbc.M004545200

Cause the DOI is really just a subset of Handles, any resolver participating
in the handle network can resolve em.  In Eric's hypothetical use case, that
could be a local enterprise handle resolver of some kind. (Although I'm not
totally sure that would keep your usage data private; the documentation I've
seen compares the handle network to DNS, it's a distributed system, I'm not
sure in what cases handle resolution requests are sent 'upstream' by the
handle resolver, and if actual individual lookups are revealed by that or
not. But in any case, when Ross suggests -- Presumably dx.hellman.net would
need to harvest its metadata from somewhere, which seems like it would leave
a footprint. It also needs some mechanism to stay in sync with the master
index. -- my reading this suggests this is _built into_ the handle
protocol, it's part of handle from the very start (again, the DNS analogy,
with the emphasis on the distributed resolution aspect), you don't need to
invent it yourself. The details of exactly how it works, I don't know enough
to say.  )

Now, I'm somewhat new to this stuff too, I don't completely understand how
it works.  Apparently hdl.handle.net can strikehandle/strike deal with
any handle globally, while presumably dx.doi.org can only deal with the
subset of handles that are also DOIs.  And apparently you can have a handle
resolver that works over something other than HTTP too. (Although Ross
argues, why would you want to? And I'm inclined to agree).

But appears that the handle system is quite a bit more fleshed out than a
simple purl server, it's a distributed protocol-independent network.   The
protocol-independent part may or may not be useful, 

Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Tom Keays
Interesting stuff. I never really thought about it before that DOIs
can be served up by the Handle server. E.G.,

http://dx.doi.org/10.1074/jbc.M004545200 =
http://hdl.handle.net/10.1074/jbc.M004545200

But, even more surprising to me was realizing that Handles can be
resolved by the DOI server. Or presumably any DOI server.

http://hdl.handle.net/2027.42/46087 = http://dx.doi.org/2027.42/46087

I suppose I should have understood this point since the Handle service
does sort of obliquely say this.

http://www.handle.net/factsheet.html

Anyway, good to have it made explicit.

Tom

On Mon, Nov 23, 2009 at 4:03 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 The actual handle is 10.1074/jbc.M004545200 .  If your software wants to
 get a handle to give it to any handle resolver of it's choice, it's going
 to have to parse the doi: or info: versions to get the handle out first.
  The info version is a URI that has a DOI handle embedded in it.  The doi
 version is... um, I dunno, just a convention, I think, that has a DOI handle
 embedded in it.

 Likewise, if your software had a URI, and was smart enough to know that the
 URI http://dx.doi.org/10.1074/jbc.M004545200; actually had a handle
 embedded in it, it could strip the handle out, and then resolve it against
 some other handle server that participates in the handle network, like
 hdl.handle.net.  But that would be kind of going against the principle to
 treat URI's as opaque identifiers and not parse them for internal data.

 But me, I end up going against that principle all the time in actual
 practice, actually for scenarios kind of analagous to, but less well-defined
 and spec'd than, getting the actual handle out of the URI and resolving it
 against some other service. For instance, getting an OCLCnum out of an
 http://worldcat.oclc.org/ URI, to resolve against my local catalog that
 knows something about OCLCnums, but doesn't know anything about
 http://worldcat.oclc.org URIs that happen to have an OCLCnum embedded in
 them. Or getting an ASIN out of a http://www.amazon.com/ URI, to resolve
 against Amazon's _own_ web services, which ironically know something about
 ASIN's but don't know anything about www.amazon.com URI's that have an ASIN
 embedded in them.  Actually quite analagous to getting the actual handle out
 of an http://dx.doi.org or http://hdi.handle.net URI, in order to resolve
 against the resolver of choice.

 Jonathan

 Ross Singer wrote:

 On Mon, Nov 23, 2009 at 2:52 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:



 Well, here's the trick about handles, as I understand it.  A handle, for
 instance, a DOI, is 10.1074/jbc.M004545200.


 Well, actually, it could be:
 10.1074/jbc.M004545200
 doi:10.1074/jbc.M004545200
 info:doi/10.1074/jbc.M004545200

 etc.  But there's still got to be some mechanism to get from there to:
 http://dx.doi.org/10.1074/jbc.M004545200
 or
 http://dx.hellman.net/10.1074/jbc.M004545200

 I don't see why it's any different, fundamentally, than:

 http://purl.hellman.net/?purl=http%3A%2F%2Fpurl.org%2FNET%2Fdoi%2F10.1074%2Fjbc.M004545200

 besides being prettier.

 Anyway, my argument wasn't that Purl was technologically more sound
 that handles -- Purl services have a major single-point-of-failure
 problem -- it's just that I don't buy the argument that handles are
 somehow superior because they aren't limited to HTTP.

 What I'm saying is that there plenty of valid reasons to value handles
 more than purls (or any other indirection service), but independence
 to HTTP isn't one of them.

 -Ross.



 While, for DOI handles, normally we resolve that using dx.doi.org, at
 http://dx.doi.org/10.1074/jbc.M004545200, that is not actually a
 requirement
 of the handle system. You can resolve it through any handle server, over
 HTTP or otherwise. Even if it's still over HTTP, it doesn't have to be at
 dx.doi.org, it can be via any handle resolver.

 For instance, check this out, it works:

 http://hdl.handle.net/10.1074/jbc.M004545200

 Cause the DOI is really just a subset of Handles, any resolver
 participating
 in the handle network can resolve em.  In Eric's hypothetical use case,
 that
 could be a local enterprise handle resolver of some kind. (Although I'm
 not
 totally sure that would keep your usage data private; the documentation
 I've
 seen compares the handle network to DNS, it's a distributed system, I'm
 not
 sure in what cases handle resolution requests are sent 'upstream' by the
 handle resolver, and if actual individual lookups are revealed by that or
 not. But in any case, when Ross suggests -- Presumably dx.hellman.net
 would
 need to harvest its metadata from somewhere, which seems like it would
 leave
 a footprint. It also needs some mechanism to stay in sync with the master
 index. -- my reading this suggests this is _built into_ the handle
 protocol, it's part of handle from the very start (again, the DNS
 analogy,
 with the emphasis on the distributed resolution aspect), you don't need
 to
 invent 

Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Ben O'Steen
What happens if the main doi resolver goes down? I'd be interested to see
how well a local resolver works when blocked from this upstream server. Are
there any other upstream servers?

Ben

On Nov 23, 2009 10:10 PM, Tom Keays tomke...@gmail.com wrote:

Interesting stuff. I never really thought about it before that DOIs
can be served up by the Handle server. E.G.,

http://dx.doi.org/10.1074/jbc.M004545200 =

http://hdl.handle.net/10.1074/jbc.M004545200
But, even more surprising to me was realizing that Handles can be
resolved by the DOI server. Or presumably any DOI server.

http://hdl.handle.net/2027.42/46087 = http://dx.doi.org/2027.42/46087

I suppose I should have understood this point since the Handle service
does sort of obliquely say this.

http://www.handle.net/factsheet.html

Anyway, good to have it made explicit.

Tom

On Mon, Nov 23, 2009 at 4:03 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 The actual handle ...


Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Jonathan Rochkind
What do you mean by a local resolver?  If you're talking about a local 
handle resolver adhering to the handle spec... well, then it depends on 
the handle spec I guess, which I don't know. But since all the handle 
documetnation keeps saying like DNS, then I'd imagine it has similar 
(or better) redundancy built into it as DNS does. But I don't know.


Poking around on handle.net, it looks like the handle infrastructure 
supports this,but you would have had to actually configure 'backup' 
handle resolvers -- similar to DNS in that if the DNS for your domain 
goes down, and you _haven't_ gotten someone else at another location to 
be a 'backup' resolver for you, and specified them as a nameserver in 
your DNS record... then you're out of luck. But the protocol supports 
that, and if you have done it (as most everyone does with DNS), you're 
good.


I have no idea if 'most everyone' does it with handle or not, but handle 
supports it. Note that if dx.doi.org goes down, you obviously won't be 
able to resolve at dx.doi.org -- but IF it works as I think (I'm still 
confused), AND dx.doi.org has distributed their handles to a backup 
resolver, then you'd still be able to resolve via hdl.handle.net, or via 
your own local handle resolver (which will in turn find the backup 
resolver).


http://www.handle.net/lhs.html

Jonathan

Ben O'Steen wrote:

What happens if the main doi resolver goes down? I'd be interested to see
how well a local resolver works when blocked from this upstream server. Are
there any other upstream servers?

Ben

On Nov 23, 2009 10:10 PM, Tom Keays tomke...@gmail.com wrote:

Interesting stuff. I never really thought about it before that DOIs
can be served up by the Handle server. E.G.,

http://dx.doi.org/10.1074/jbc.M004545200 =

http://hdl.handle.net/10.1074/jbc.M004545200
But, even more surprising to me was realizing that Handles can be
resolved by the DOI server. Or presumably any DOI server.

http://hdl.handle.net/2027.42/46087 = http://dx.doi.org/2027.42/46087

I suppose I should have understood this point since the Handle service
does sort of obliquely say this.

http://www.handle.net/factsheet.html

Anyway, good to have it made explicit.

Tom

On Mon, Nov 23, 2009 at 4:03 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
  

The actual handle ...



  


Re: [CODE4LIB] Assigning DOI for local content

2009-11-23 Thread Jonathan Rochkind

More info here too:

http://www.handle.net/introduction.html

This handle stuff is interesting, but I don't entirely understand it.

I guess if the Global Handle Service really went down, it would be 
similar to a root-level DNS server going down -- you'd be in trouble, 
somewhat mitigated by whatever data your local resolver had cached.


Of course, CNRI maintains several failover mirrors of the Global Handle 
Service for that reason. (Much as we'd hope all the root-level DNS 
servers are thorougly failover-ed).


Jonathan

Ben O'Steen wrote:

What happens if the main doi resolver goes down? I'd be interested to see
how well a local resolver works when blocked from this upstream server. Are
there any other upstream servers?

Ben

On Nov 23, 2009 10:10 PM, Tom Keays tomke...@gmail.com wrote:

Interesting stuff. I never really thought about it before that DOIs
can be served up by the Handle server. E.G.,

http://dx.doi.org/10.1074/jbc.M004545200 =

http://hdl.handle.net/10.1074/jbc.M004545200
But, even more surprising to me was realizing that Handles can be
resolved by the DOI server. Or presumably any DOI server.

http://hdl.handle.net/2027.42/46087 = http://dx.doi.org/2027.42/46087

I suppose I should have understood this point since the Handle service
does sort of obliquely say this.

http://www.handle.net/factsheet.html

Anyway, good to have it made explicit.

Tom

On Mon, Nov 23, 2009 at 4:03 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
  

The actual handle ...



  


Re: [CODE4LIB] Web analytics for POST data

2009-11-23 Thread Yitzchak Schaffer

Alejandro Garza Gonzalez wrote:
1) You *can* use GA and some Javascript embedded in your III pages to 
log events (as they´re called in GA lingo). The javascript (depending 
on your coding wizardry level) could track anything from hovers over 
elements, form submission, next page events, etc.


Hi Alejandro,

Thanks for a great suggestion.  I tried poking around at it; it seems to 
me like Events aren't built for what I'm really interested in doing, 
namely systematic exploration and analysis of the search sessions.  IOW, 
let's say a form looks like


t=finn
a=twain
l=circ,reserve

It looks like I could log this as three separate events, or one; but 
either way, how would one analyze this?  I'm not interested (solely) in 
how many times this particular query was entered.


I started looking at ways to funnel the params into my own tracking 
script, the prototype of which just writes a line to a text file with a 
JSON serialization of the form data; but I'm not a JS ninja, so I'm 
still trying to figure out how to get around the XSS problems.


Ruddy III turnkey...

--
Yitzchak Schaffer
Systems Manager
Touro College Libraries
33 West 23rd Street
New York, NY 10010
Tel (212) 463-0400 x5230
Fax (212) 627-3197
Email yitzchak.schaf...@tourolib.org