Re: [CODE4LIB] looking for free hosting for html code

2015-05-22 Thread Joe Hourcle

On Fri, 22 May 2015, Sarles Patricia (18K500) wrote:

[trimmed]


I plan to teach coding to my 6th and 12th grade students next school year and 
our lab has a mixture of old (2008) and new Macs (2015) so I want to make all 
the Macs functional for writing code in an editor.

My next question is this:

I am familiar with free Web creation and hosting sites like Weebly, Wix, 
Google sites, Wikispaces, WordPress, and Blogger, but do you know of any 
free hosting sites that will allow you to plug in your own code. i.e. 
host your own html files?


If it's straight HTML, and doesn't need any sort of text pre-processing 
(SSI, ASP, JSP, PHP, ColdFusion, etc.), I think that you can use Google 
Drive.  This help page seems to suggest that's true:


https://support.google.com/drive/answer/2881970?hl=en

With all static files it might also be possible to lay things out so that 
you could serve it through github or similar.  (and teaching them about 
version control isn't a bad idea, either)


-Joe


Re: [CODE4LIB] oops (was: [CODE4LIB] Any Apache mod_rewrite experts out there?)

2015-05-18 Thread Joe Hourcle

On Mon, 18 May 2015, Karl Holten wrote:


As Joe cautioned might happen with the N directive, I get an infinite loop from this. 
It keeps prepending the domain name so you get 
http://www.google-com.topcat.switchinc.org/http://www.google-com.topcat.switchinc.org//
 after the second pass, after the third pass you get the full domain three times and 
so on.

My not-so-genius solution to this problem is just to put in the same 
RedirectRule 6 times in a row and skip the N directive.



If you know how many you're dealing with, but don't want it to pass 
through quite as many rules, you can use a power of twos trick:


1. Have a rule that replaces 4 dashes
2. Have a rule that replaces 2 dashes
3. Have a rule that replaces 1 dash

it'll then handle between 0 and 7 dashes.

At 6, this might not be worth it ... when you start getting above a dozen 
(which hopefully you'll never have to deal with), it might be.


-Joe


[CODE4LIB] oops (was: [CODE4LIB] Any Apache mod_rewrite experts out there?)

2015-05-18 Thread Joe Hourcle

On Mon, 18 May 2015, Joe Hourcle wrote:



 RewriteRule ^(.*)-(.*)ezproxy.switchinc.org/(.*) $1.$2ezproxy.switchinc.org/$3 
[N]


I wasn't thinking ... RewriteRule doesn't operate on the full URL, only on 
the non-host portion.  You might need to do strange things w/ RewriteCond 
and RewriteRule instead:


  RewriteCond %{HTTP_HOST} ^(.*)-(.*)ezproxy.switchinc.org
  RewriteRule ^(.*) http://%1.%2ezproxy.switchinc.org/$1 [N]

The %1 and %2 come from the captures in RewriteCond, while the $1 comes 
from RewriteRule.


-Joe


Re: [CODE4LIB] Any Apache mod_rewrite experts out there?

2015-05-18 Thread Joe Hourcle

On Fri, 15 May 2015, Karl Holten wrote:


My organization is changing proxy servers from WAM to EZproxy, and we would 
like to give staff time to change over their URLs before we make the switch. I 
would like to set up forwarding so the links using the new proxy get redirected 
to the old proxy. I'm planning on using apache's mod_rewrite to do this.

Basically, this mod rewrite rule needs to do three things:
1) Change the part of the domain name (not the file path) that reads 
"ezproxy.switchinc.org" to our old domain "topcat.switchinc.org"
2) Append the prefix "0-" in front of the domain name
3) Transform dashes in the domain name to periods.

Could anyone provide me with some assistance? I am sure this takes maybe 
10 lines of code from a mod_rewrite expert, but it has taken me several 
weeks to the first two objectives done, and all of my google fu is 
failing me for objective three. Below is what I have:


RewriteEngine on
RewriteCond %{HTTP_HOST} ^(.*)ezproxy.switchinc.org [NC]
RewriteRule ^(.*) %{HTTP_HOST}/$1 [DPI]
RewriteRule ^(.*)ezproxy.switchinc.org/(.*) http://0-$1topcat.switchinc.org/$3 
[L]



The problem with number three is that you don't know how many dashes there 
are that need to be changed to periods, so you have to use [N], and can't 
use it in a chain.  I'm actually going to re-arrange it to 1,3,2, so we 
can leave the [L] on the #2 (in case you had any other rules that might 
mess with it).


It also seems suspicious that you'd insert a '0-' when you're then going 
to be replacing all of the dashes with periods -- especially as I don't 
believe that '0' is a valid host name in DNS.  I've assumed that the '0-' 
stays ... otherwise, replace it with '0.'


Also note that I changed your #2 from using $3 to $2 -- you only had two 
sets of capturing parens.



  RewriteCond %{HTTP_HOST} ^(.*)ezproxy.switchinc.org [NC]
  RewriteRule ^(.*) %{HTTP_HOST}/$1 [DPI]

  RewriteRule ^(.*)-(.*)ezproxy.switchinc.org/(.*) 
$1.$2ezproxy.switchinc.org/$3 [N]

  RewriteRule ^(.*)ezproxy.switchinc.org/(.*) 
http://0-$1topcat.switchinc.org/$2 [L]

If we assume that the FQDN only contains numbers, letters, dashes and 
periods (no underscores or other characters), you might check how the 
first one compares to:


  RewriteRule ^([a-zA-Z0-9\.]+)-([a-zA-Z0-9\.\-]*)ezproxy.switchinc.org/(.*) 
$1.$2ezproxy.switchinc.org/$3 [N]


... and note that I haven't tested this -- it should work, but with [N], 
you run the risk of endless loops.



-Joe


ps. Back in the days when I used to manage the spam filters for an
ISP, domain names that tried to look like IP addresses were one of my
rejection rules, which it's possible that your site might look like.

://(\d+\.){4]

I know, you're wondering why I don't have http:// or https?:// ...
because this was so old, people were still hosting spam on FTP
servers.  Also note that there is (was?) an ad company that was
serving ads from something that looked like a numeric (1o8 ? lo8) and
then prepended 3 numeric blocks before it, so it looked like an IP
address, but wouldn't have been caught by this rule).


Re: [CODE4LIB] free html editors

2015-05-17 Thread Joe Hourcle

On Sat, 16 May 2015, Miles Fidelman wrote:

[trimmed]

Your real problem might be running a browser that's new enough to support 
HTML5 and CSS3.  Otherwise, editing HTML isn't going to do you much good.


Apple won't let the most recent version of Safari run on 10.6.8 (you're 
stuck at 5.1.10), but Firefox (38.0.1) and Chrome (42.0.2311.152) are both 
fine.


-Joe


Re: [CODE4LIB] free html editors

2015-05-16 Thread Joe Hourcle

On Sat, 16 May 2015, Nathan Rogers wrote:

If you do not need all the bells and whistles I would recommend 
TextWrangler. Free versions should still be available online and its 
bigger brother BBEdit is overkill for basic web editing.


Actually, the significant difference between TextWrangler and BBEdit is 
that BBEdits has a number of features that are specifically for web 
design, that don't exist in TextWrangler.


Looking at the version of BBEdit 9.1 that I have installed, the majority 
of it is in the 'Markup' menu:


* Close current tag / Balance tags
* Check syntax
* Check links
* Check accessibility
* Cleaners for GoLive/PageMill/HomePage/DreamWeaver
* Convert to HTML / XHTML
* Menu items to insert tags (which then give what attributes are allowed)
* Menu item to insert CSS
* Preview in ... (gives a list of installed web browsers)

...

That said, TextWrangler is still a good free editor -- and I personally 
rarely ever use the insert tags/CSS items (as I've been writing HTML for 
... crap ... I feel old ... 20+ years).


But to say that BBEdit is overkill for web editing is just wrong -- the 
majority of the feature differences are *specifically* for web editing.


-Joe

(disclaimer: for a decade or so, I was a beta tester for BareBones.  I 
haven't been using the latest-and-greatest version in a while, as I prefer 
not to install newer version of MacOSX on my personal systems ... 
basically, since Apple decided to bring all of the iOS annoyances into the 
desktop.  As such, I can't install BBEdit 10 or 11 to see what the 
difference are in more recent versions)




-Original Message-
From: "Sarles Patricia (18K500)" 
Sent: ?5/?16/?2015 10:21 AM
To: "CODE4LIB@LISTSERV.ND.EDU" 
Subject: [CODE4LIB] free html editors

I just this minute subscribed to this list after reading Andromeda Yelton's 
column in American Libraries from yesterday with great interest since I would 
like to teach coding in my high school library next year.

I purchased Andy Harris' HTML5 and CSS3 All-in-One For Dummies for my summer 
reading and the free HTML editors he mentions in the book are either not really 
free or are not compatible with my lab's 2008 Macs.

Can anyone recommend a free HTML editor for older Macs?

Many thanks and happy to be on this list,
Patricia



Patricia Sarles, MA (Anthropology), MLS
Librarian
Jerome Parker Campus Library
100 Essex Drive
Staten Island, NY 10314
718-370-6900 x1322
psar...@schools.nyc.gov
http://jeromeparkercampus.libguides.com/home

You can tell whether a man is clever by his answers. You can tell whether a man 
is wise by his questions. - Naguib Mahfouz

As a general rule the most successful man in life is the man who has the best 
information. - Benjamin Disraeli



Re: [CODE4LIB] pdf and web publishing question

2015-04-29 Thread Joe Hourcle

On Wed, 29 Apr 2015, Sergio Letuche wrote:


Dear all,

we have a pdf, that is taken from a to be printed pdf, full of tables. The
text is split in two columns. How would you suggest we uploaded this pdf to
the web? We would like to keep the structure, and split each section taken
from the table of contents as a page, but also keep the format, and if
possible, serve the content both in an html view, and in a pdf view, based
on the preference of the user.


The last time I spoke to someone from AAS about how they extracted  their 
'Data Behind the Table' (aka 'DbT'), it was mostly dependent upon getting 
something from the author when it was still in a useful format.




The document is made with Indesign CS6, and i do not know in which format i
could transform it into


There are a few ways to do tables in InDesign, as it's page layout 
software.  If it's in a single table within a text block, and there's 
nothing strange within each cell, you should be able to just select the 
table, copy it, and paste it out into a text editor.  You'll get line 
returns between each row, and tabs between each cell.


If they've placed line returns within the cells, those will get pasted in 
the middle of the cell, which can really screw you up.


For cases like that, it's sometimes easiest to go through the file, and 
paste HTML elements at the beginning of each cell to mark table cells 
(), so when you export, you have markers as to which are legitimate 
changes in cells, and which are line returns in the file.


I then do post-processing to add in the close cells, and the row markers.

If I were using BBEdit, I'd do:

Find :
\t
Replace :


Find:
\r
Replace :
\r

If you're doing it in some other editor that supports search/replace, you 
should be able to do similar, but you might need to figure out how to 
specify tabs & line returns in your program.


... and then fix the initial & final lines.  (and maybe convert some of 
the s into s)


-Joe


ps.  after getting in trouble last week, I should mention that all
 statements are my own, and I don't represent NASA or any other
 organizations in this matter.


Re: [CODE4LIB] Data Lifecycle Tracking & Documentation Tools

2015-03-13 Thread Joe Hourcle

On Wed, 11 Mar 2015, davesgonechina wrote:


Hi John,

Good question - we're taking in XLS, CSV, JSON, XML, and on a bad day PDF
of varying file sizes, each requiring different transformation and audit
strategies, on both regular and irregular schedules. New batches often
feature schema changes requiring modification to ingest procedures, which
we're trying to automate as much as possible but obviously require a human
chaperone.

Mediawiki is our default choice at the moment, but then I would still be
looking for a good workflow management model for the structure of the wiki,
especially since in my experience wikis are often a graveyard for the best
intentions.



A few places that you might try asking this question again, to see if you 
can find a solution that better answers your question:



The American Society for Information Science & Technology's Research Data 
Access & Preservation group.  It has a lot of librarians & archivists in 
it, as well as people from various research disiplines:


http://mail.asis.org/mailman/listinfo/rdap
http://www.asis.org/rdap/

...

The Research Data Alliance has a number of groups that might be relevant. 
Here are a few that I suspect are the best fit:


Libraries for Research Data IG
https://rd-alliance.org/groups/libraries-research-data.html

Reproducibility IG
https://rd-alliance.org/groups/reproducibility-ig.html

Research Data Provenance IG
https://rd-alliance.org/groups/research-data-provenance.html

Data Citation WG
(as this fits into their 'dynamic data' problem)
https://rd-alliance.org/groups/data-citation-wg.html

('IG' is 'Interest Group', which are long-lived.  'WG' is 'Working Group' 
which are formed to solve a specific problem and then disband)


The group 'Publishing Data Workflows' might seem to be appropriate but 
it's actually 'Workflows for Publishing Data' not 'Publishing of Data 
Workflows' (which falls under 'Data Provenance' and 'Data Citation')


There was a presentation at the meeting earlier this week by Andreas 
Rauber in the Data Citation group on workflows using git or SQL databases 
to be able to track appending or modification for CSV and similar ASCII 
files.


...

Also, I would consider this to be on-topic for Stack Exchange's "Open 
Data" site  (and I'm one of the moderators for the site):


http://opendata.stackexchange.com/

-Joe






On Tue, Mar 10, 2015 at 8:10 PM, Scancella, John  wrote:


Dave,

How are you getting the metadata streams? Are they actual stream objects,
or files, or database dumps, etc?

As for the tools, I have used a number of the ones you listed below. I
personally prefer JIRA (and it is free for non-profit). If you are ok if
editing in wiki syntax I would recommend mediaWiki (it is what powers
Wikipedia). You could also take a look at continuous deployment
technologies like Virtual Machines (virtualbox), linux containers (docker),
and rapid deployment tools (ansible, salt). Of course if you are doing lots
of code changes you will want to test all of this continually (Jenkins).

John Scancella
Library of Congress, OSI

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
davesgonechina
Sent: Tuesday, March 10, 2015 6:05 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Data Lifecycle Tracking & Documentation Tools

Hi all,

One of my projects involves harvesting, cleaning and transforming steady
streams of metadata from numerous publishers. It's an infinite loop but
every cycle can be a little bit or significantly different. Many issue
tracking tools are designed for a linear progression that ends in
deployment, not a circular workflow, and I've not hit upon a tool or use
strategy that really fits.

The best illustration I've found so far of the type of workflow I'm
talking about is the DCC Curation Lifecycle Model <
http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf



.

Here are some things I've tried or thought about trying:

   - Git comments
   - Github Issues
   - MySQL comments
   - Bash script logs
   - JIRA
   - Trac
   - Trello
   - Wiki
   - Unfuddle
   - Redmine
   - Zendesk
   - Request Tracker
   - Basecamp
   - Asana

Thoughts?

Dave





Re: [CODE4LIB] Get It Services / Cart

2015-03-06 Thread Joe Hourcle

On Fri, 6 Mar 2015, Smith, Steelsen wrote:


Hi All,

I'm new to this list, so if there are any conventions I'm ignoring I'd 
appreciate someone letting me know.


I'm working on a project to allow requests that will go to multiple 
systems to be aggregated in a requesting interface. It would be 
implemented as an independent application, allow a "shopping list" of 
items to be added, and be able to perform some back end business logic 
(availability checking, metadata enrichment, etc.).


This seems like a very common use case so I'm surprised that I've had 
trouble finding anyone who has published an application that works like 
this - the closest I've found being Umlaut which doesn't seem to support 
multiple simultaneous requesting (although I couldn't get as far as 
"request" in any sample system to be certain). Is anyone on the list 
aware of such a project?



I'm aware of such a project.  And it's been the bane of my existance for 
5+ years.  I've actually asked my boss to fire me a few times so that I 
don't have to support it, as it's more like babysitting than anything 
else.


However, it's for science archives, not libraries, and only really 
supports objects that are stored in FITS (Flexibile Image Transport 
System).


I cannot in good faith recommend that anyone use it.  I've even started up 
a mailing list for IT people in solar physics archives so that I can try 
to make sure that we fight against implementing it for any new scientific 
missions.


-Joe

ps. it's not an independent application ... it's the service that does 
the 'metadata enrichment' because they store all of the data without any 
metadata so that anyone outside not running their custom software can 
actually make use of it ... and then I manage the system that does the 
aggregation, and someone else wrote the logic for availability checking 
(which seems to have decided to crap itself last month, shortly after the 
programmer who wrote it 5+ years ago moved on to another job).


pps.  if you're going to implement something like this, I'd recommend 
using metalink for the 'shipping cart' sort of stuff, and hand off to some 
dedicated download manager.  For our community, an even better option 
would be BagIt with a fetch.txt file, but the client-side tool support 
just isn't out there.


[CODE4LIB] Fwd: [CNI-ANNOUNCE] Call for Participation: Security and Privacy Agenda Workshop, March 3, 2015

2015-02-06 Thread Joe Hourcle
I saw 'hardening OAI-PMH', and thought this might be of interest to this group.

-Joe



Begin forwarded message:

> From: "Clifford Lynch" 
> Date: February 6, 2015 4:16:15 PM EST
> To: "CNI-ANNOUNCE -- News from the Coalition" 
> Subject: [CNI-ANNOUNCE] Call for Participation: Security and Privacy Agenda 
> Workshop, March 3, 2015
> Reply-To: "CNI-ANNOUNCE -- News from the Coalition" 
> 
> On March 3 CNI is going to host a small workshop to develop a near term 
> agenda for work needed to improve security and privacy in systems related to 
> scholarly communication and access to scholarly information resources. The 
> focus will be largely technical, and will emphasize setting an agenda for 
> various groups to address needs and problems, rather than details of how to 
> solve specific problems. I've deliberately left the agenda  scoped rather 
> broadly , and I want to look at everything from encouraging wider and more 
> routine use of HTTPS to hardening some popular protocols like OAI-PMH. 
> Technical identity management related issues are also in scope, as are some 
> discussions about appropriate levels of assurance.
> 
> We'll meet in Washington DC from 10AM-3PM on Tuesday, March 3. CNI will 
> provide refreshments and lunch, but we will not cover travel expenses.
> 
> This will be a small workshop, and we will do our best to balance for 
> different perspectives. If you are interested in attending, please send  an 
> email to Joan Lippincott  (j...@cni.org) with a brief summary of the 
> expertise and perspective you would bring to the meeting. Given that the 
> meeting is only about a month away, I'll send out a first batch of 
> acceptances by Feb 13, and after than respond to later applications as they 
> come in. I'll provide more detailed logistical information with acceptances.
> 
> There will be a public report from the meeting, and for those who cannot 
> attend, suggestions and comments are welcome going into the meeting.
> 
> If you have questions, please be in touch with me by email.
> 
> Clifford Lynch
> Director, CNI
> cl...@cni.org
> 
> 
> ==
> This message is sent to you because you are subscribed to
>the mailing list .
> To unsubscribe, E-mail to: 
> To switch to the DIGEST mode, E-mail to 
> To switch to the INDEX mode, E-mail to 
> To postpone your subscription, E-mail to 
> To resume mail list message delivery from postpone mode, E-mail to 
> 
> Send administrative queries to  
> 
> Visit the CNI-ANNOUNCE e-mail list archive at 
> .
> 
> 
> 


Re: [CODE4LIB] Checksums for objects and not embedded metadata

2015-01-24 Thread Joe Hourcle
On Jan 23, 2015, at 5:35 PM, Kyle Banerjee wrote:

> Howdy all,
> 
> I've been toying with the idea of embedding DOI's in all our digital assets
> and possibly inserting/updating other metadata as well. However, doing this
> would alter checksums created using normal methods.
> 
> Is there a practical/easy way to checksum only the objects themselves
> without the metadata? If the metadata in a tiff or other kind of file is
> modified, it does nothing to the actual object. Since providing more
> complete metadata within objects makes them more usable/identifiable and
> might simplify migrations down the road, it seems like this wouldn't be a
> bad way to go.


The only file format that I'm aware of that has a provision for this 
is FITS (Flexible Image Transport System), which was a concept of a 
'CHECKSUM' and a 'DATASUM'.  (the 'DATASUM' is the checksum for only
the payload portion, the 'CHECKSUM' includes the metadata)[1].  It's
possible that there are others, but I suspect that most consumer
file formats won't have specific provisions for this.

The problems with 'metadata' in a lot of file formats is that they're
just arbitrary segments -- you'd have to have a program that knew
which segments were considered 'headers' vs. not.  It might be easier
to have it be able to compute a separate checksum for each segment,
so that should the modifications change their order, they'd still
be considered valid.

Of course, I personally don't like changing files if I can help it.
If it were me, I'd keep the metadata outside the file;  if you're
using BagIt, you could easily add additional metadata outside of
the data directory.[2]

If you're just doing this internally, and don't need the DOI to be
attached to the file when it's served, you could also look into
file systems that support arbitrary metadata.  Older Macs used
to use this, where there was a 'data fork' and a 'resource fork',
but you had to have a service that knew to only send the data fork.
Other OSes support forks, but some also have 'extended file
attributes', which allows you to attach a few key/value pairs
to the file.  (exact limits are dependent upon the OS).

-Joe


[1] http://fits.gsfc.nasa.gov/registry/checksum.html
[2] https://tools.ietf.org/html/draft-kunze-bagit ; 
http://en.wikipedia.org/wiki/BagIt


Re: [CODE4LIB] Plagiarism checker

2015-01-23 Thread Joe Hourcle
On Jan 23, 2015, at 9:44 AM, Mark A. Matienzo wrote:

> I believe Turnitin and SafeAssign both compare the text of submissions to
> against external sources (e.g., SafeAssign uses ABI/INFORM, among others).
> I am not certain if they compare submissions against each other.

My understanding of TurnItIn, at least initially, was that they
built their corpus on existing submissions.  

(they had some deals with universities back when they started up
to use their service for free or cheap, so that they could build
up their corpus).


> However, if you're looking for something along the lines of what Dre
> suggests, you could use ssdeep, which is an implementation of a piecewise
> hashing algorithm [0]. The issue with that you would have to assume that
> all students would probably be using the same file format.
> 
> You could also using something like Tika to extract the text content from
> all the submissions, and then compare them against each other.

I'd agree on extracting the text.  MS Word used to store documents
as strings of edits, making it difficult to compare two
documents for similarity without parsing the format.

(I don't know if they still do this in .docx)

-Joe


Re: [CODE4LIB] Lost thread - centrally hosted global navbar

2015-01-10 Thread Joe Hourcle
On Jan 10, 2015, at 8:37 PM, Jason Bengtson wrote:

> Do you have access to the server-side? Server side scripting languages (and
> the frameworks and CMSes built with them) have provisions for just this
> sort of thing. Include statements in PHP and cfinclude tags in coldfusion,
> for example. Every Content Management System I've used has had a provision
> to create reusable content that can be added to multiple pages as blocks or
> via shortcodes. If you can use server-side script I recommend it; that's
> really the cleaner way to do this sort of thing. Another option you could
> use that avoids something like iframes is to create a javascript file that
> dynamically creates the navbar dynamically in your pages. Just include the
> javascript file in any page you want the toolbar to appear in. That method
> adds some overhead to your pages, but it's perfectly workable if
> server-side script is out of reach.


The javascript trick works pretty well when you have people 
mirroring your site via wget (as they won't run the js, and
thus won't try to retrieve all of the images that are used 
to make the page pretty every time they run their mirror job.

You can see it in action at:

http://stereo-ssc.nascom.nasa.gov/data/ins_data/

The drawback is that some browsers have a bit of a flash
when they first hit the page.  It might be possible to
mitigate the problem by having the HTML set the background
to whatever color the background will be changed to, but I
don't quite the flexibility to do that in my case, due to
how the page is being generated.

-Joe

ps.  It's been years since I've done ColdFusion, but I
remember there being a file that you could set, that would
automatically getting inserted into every page in that
directory, or in sub-directories.  I want to say it was
often used for authentication and such, but it might be
possible to use for this.  If nothing else, you could load
header into a variable, and have the pages just print the
variable in the right location.


Re: [CODE4LIB] linked data and open access

2014-12-19 Thread Joe Hourcle
On Dec 19, 2014, at 12:28 PM, Kyle Banerjee wrote:

> On Fri, Dec 19, 2014 at 7:57 AM, Joe Hourcle 
> wrote:
> 
>> 
>> I can't comment on the linked data side of things so much, but in
>> following all of the comments from the US's push for opening up access to
>> federally funded research, I'd have to say that capitalism and
>> protectionist attitudes from 'publishers' seem to be a major factor in the
>> fight against open access.
>> 
> 
> That definitely doesn't help. But quite a few players own this problem.
> 
> Pockets where there is a culture of openness can be found but at least in
> my neck of the woods, researchers as a group fear being scooped and face
> incentive structures that discourage openness. You get brownie points for
> driving your metrics up as well as being first and novel, not for investing
> huge amounts of time structuring your data so that everyone else can look
> great using what you created.

There's been a lot of discussion of this problem over the last ~5 years or
so.  The general consensus is that :

1. We need better ways for people to acknowledge data being re-used.

a. The need for standards for citation so that we can use 
   bibliometric tools to extract the relationships
b. The need for a citation specifically to the data, and not
   a proxy (eg, the first results or instrument papers), to show
   that maintaining the data is still important.
c. Shift the work in determining how to acknowledge the data
   from the re-user back to the distributor the data.

2. We need standards to make it easier for researchers to re-use data.

Findability, accessibility of the file formats, documentation of
data, etc.

3. We need institutions to change their culture to acknowledge that 
   producing really good data is as important for the research ecosystem
   as writing papers.  This includes decisions regarding awarding grants,
   tenure & promotion, etc.


Much of this is covered by the Joint Declaration of Data Citation
Principles:

https://force11.org/datacitation

There are currently two sub-groups; one working on dissemination, to
make groups aware of the issues & the principles, and another (that I'm
on) working on issues of implementation.  We actually just submitted
something to PeerJ this week, on how to deal with 'machine actionable'
landing pages:

https://peerj.com/preprints/697/

(I've been pushing for one of the sections to be clarified, so feel
free to comment ... if enough other people agree w/ me, maybe I can
get my changes into the final paper)


> Libraries face their own challenges in this regard. Even if we ignore that
> many libraries and library organizations are pretty tight with what they
> consider their intellectual property, there is still the issue that most of
> us are also under pressure to demonstrate impact, originality, etc. As a
> practical matter, this means we are rewarded for contributing to churn,
> imposing branding, keeping things siloed and local, etc. so that we can
> generate metrics that show how relevant we are to those who pay our bills
> even if we could do much more good by contributing to community initiatives.

But ... one of the other things that libraries do is make stuff available
to the public.  So as most aren't dealing with data, getting that into
their IRs means that they've then got more stuff that they can serve
to possibly help push up their metrics.

(not that I think those metrics are good ... I'd rather *not* transfer
data that people aren't going to use, but the bean counters like those
graphs of data transfer going up ... we just don't mention that it's
groups in China attempting to mirror our entire holdings)



> With regards to our local data initiatives, we don't push the open data
> aspect because this has practically no traction with researchers. What does
> interest them is meeting funder and publisher requirements as well as being
> able to transport their own research from one environment to another so
> that they can use it. The takeaway from this is that leadership from the
> top does matter.

The current strategy is to push for the scientific societies to implement
policies requiring the data be opened if it's to be used as evidence in
a journal article.  There are some exceptions*, but the recommendations
so far are to still set up the landing page to make the data citable,
but instead of linking directly to the data, provide an explanation of
what the procedures are to request access.

Through this, we have the requirement be that if the researcher wants
to publish their paper ... they have to provide the data, too.

We're run into a few interesting snags, though.  For instance, some a

Re: [CODE4LIB] linked data and open access

2014-12-19 Thread Joe Hourcle
On Dec 19, 2014, at 9:48 AM, Eric Lease Morgan wrote:

> I don’t know about y’all, but it seems to me that things like linked data and 
> open access are larger trends in Europe than here in the United States. Is 
> there are larger commitment to sharing in Europe when compared to the United 
> States? If so, is this a factor based on the nonexistence of a national 
> library in the United States? Is this your perception too? —Eric Morgan


I can't comment on the linked data side of things so much, but in following all 
of the comments from the US's push for opening up access to federally funded 
research, I'd have to say that capitalism and protectionist attitudes from 
'publishers' seem to be a major factor in the fight against open access.

I've placed 'publishers' in quotes, because groups that I would've considered 
to have been 'scientific societies' submitted comments against the opening up 
of the research, and in the case of AGU, referred to themselves multiple times 
as a 'publisher' and never as a 'society'.[1]  I dropped my membership when I 
realized that.


Statements from the 2011 RFI from OSTP:

http://www.whitehouse.gov/administration/eop/ostp/library/publicaccess


Statements from the 2013 NAS meetings:

http://sites.nationalacademies.org/DBASSE/CurrentProjects/DBASSE_082378

(note that I made statements at the National Academies meeting on opening 
access to federally funded research data)



[1] 
http://www.whitehouse.gov/sites/default/files/microsites/ostp/scholarly-pubs-(%23065).pdf

-Joe



ps. I still haven't seen what any of the official policies are (last year's 
government shutdown delayed the white house response to their submissions, and 
I have no idea if they've finally publicized anything) ... but I hosted a 
session at the AGU last year, where we had representatives from NOAA, NASA and 
USGS speak about what they were doing, and the NASA policy seemed to be heavily 
influenced by the more senior scientists ... who were more likely to be editors 
of journals.  They haven't updated their 'Data & Information Policy' 
(http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/)
 page in over three years.


[CODE4LIB] Fwd: [Rdap] Call for Editors for IMLS-funded DataQ Project (Due 1/30/15)

2014-12-11 Thread Joe Hourcle
A few months ago, there was a discussion of trying to try to make a libraries 
site on Stack Exchange.

For those that were interested, this might be an interesting project to 
participate in, although their scope isn't necessarily all library questions.

-Joe


Begin forwarded message:

> From: Andrew Johnson 
> Date: December 11, 2014 5:34:37 PM EST
> To: "r...@mail.asis.org" 
> Subject: [Rdap] Call for Editors for IMLS-funded DataQ Project (Due 1/30/15)
> Reply-To: "Research Data, Access and Preservation" 
> 
> Call for Editors for the DataQ Project
> 
> The University of Colorado Boulder Libraries, the Greater Western Library 
> Alliance, and the Great Plains Network are excited to announce that we have 
> received funding from the Institute of Museum and Library Services to develop 
> an online resource called DataQ, which will function as a collaborative 
> knowledge-base of research data questions and answers curated for and by the 
> library community. Library staff from any institution may submit questions on 
> research data topics to the DataQ website, where questions will then be both 
> crowd-sourced and reviewed by an Editorial Team of experts. Answers to these 
> questions, from both the community and the Editorial Team, will be posted to 
> the DataQ website and will include links to resources and tools, best 
> practices, and practical approaches to working with researchers to address 
> specific research data issues.
> 
> We are currently seeking applications for our Editorial Team. If you are 
> interested in becoming a DataQ Editor, please fill out the application form 
> here by January 30, 2015: http://bit.ly/DataQApp.
> 
> DataQ Editors will be responsible for helping to identify initial content, 
> providing expert feedback on questions from DataQ users, and developing 
> policies and procedures for answering questions. The Editorial Team will 
> participate in regular virtual meetings and attend one in-person meeting in 
> Kansas City, MO in late May. Each Editor will receive a $1000 stipend to help 
> cover travel costs and time contributed to the project.
> 
> The initial term for each Editor will last until October 31, 2015 when the 
> grant period ends, but there may be opportunities to continue serving beyond 
> the life of the grant based on the outcome of the project.
> 
> Additional opportunities to contribute to DataQ will be announced soon. For 
> all of the latest information about DataQ, please follow 
> @ResearchDataQ on Twitter. Please send any 
> questions about DataQ to the project Co-PIs Andrew Johnson at 
> andrew.m.john...@colorado.edu and Megan 
> Bresnahan at 
> megan.bresna...@colorado.edu.
> 
> -
> Andrew Johnson
> Assistant Professor; Research Data Librarian
> University of Colorado Boulder Libraries
> Phone: 303-492-6102
> Website: https://data.colorado.edu/
> ORCID iD: -0002-7952-6536
> Impactstory Profile: https://impactstory.org/AndrewJohnson
> ___
> Rdap mailing list
> r...@mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


Re: [CODE4LIB] looking for a good PHP table-manipulating class

2014-12-11 Thread Joe Hourcle
On Dec 11, 2014, at 4:32 PM, Ken Irwin wrote:

> Hi folks,
> 
> I'm hoping to find a PHP class that designed to display data in tables, 
> preferably able to do two things:
> 1. Swap the x- and y-axis, so you could arbitrarily show the table with 
> y=Puppies, x=Kittens or y=Kittens,x=Puppies
> 2. Display the table either using plain text columns or formatted html
> 
> I feel confident that in a world of 7 billion people, someone must have 
> wanted this before.


There's much more work being done in javascript tables these days than in 
backend software.

Unfortunately, I've never found a good matrix to compare features between the 
various 'data table' or 'data grid' implementations.

I did start evaluating a lot a while back, but the problem is that you have go 
go through them all to figure out what the different features might be, and 
then go back through a second time to see which ones might implement those 
features.

The second problem is that some are implemented as part of a given JS framework 
(eg, ExtJS), while other toolkits might have a dozen different 'data table' 
implementations (eg, jQuery).

-Joe


ps.  and as this wasn't a feature that I was looking for, this wasn't something 
that I tracked when I did my analysis.  I was looking for things like scaling 
to a thousand rows w/ 20 columns, rearranging/hiding columns, etc.


Re: [CODE4LIB] Balancing security and privacy with EZproxy

2014-11-20 Thread Joe Hourcle
On Nov 19, 2014, at 11:47 PM, Dan Scott wrote:

> On Wed, Nov 19, 2014 at 4:06 PM, Kyle Banerjee 
> wrote:
> 
>> There are a number of technical approaches that could be used to identify
>> which accounts have been compromised.
>> 
>> But it's easier to just make the problem go away by setting usage limits so
>> EZP locks the account out after it downloads too much.
>> 
> 
> But EZProxy still doesn't let you set limits based on the type of download.
> You therefore have two very blunt sledge hammers with UsageLimit:
> 
> - # of downloads (-transfers)
> - # of megabytes downloaded (-MB)


[trimmed]

I'm not familiar with EZProxy, but if it's running on an OS that you have 
control of (and not some vendor locked appliance), you likely have other tools 
that you can use for rate limiting.

For instance, I have a CGI on a webserver that's horribly resource intensive 
and takes quite a while to run.  Most people wonder what's taking so long, and 
reload multiple times, thinking the process is stuck ... or they know what's 
going on, and will open up multiple instances in different tabs to reduce their 
wait.

So I have the following IP tables rule:

-A INPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN -m 
connlimit --connlimit-above 5 --connlimit-mask 32 -j REJECT --reject-with 
tcp-reset

I can't remember if starts blocking the 5th connection, or once they're above 
5, but it keeps us from having one IP address with 20+ copies running at once.

...

And back from my days of managing directory servers -- brute forcing was a 
horrible problem with single sign-on.  We didn't have a good way to temporarily 
lock accounts for repeatedly failing passwords at the directory server (which 
would also cause a denial of service, as you could lock someone else) ... so it 
had to be up to each application to implement ... which of course, they didn't.

... so you'd have something like a webpage that required authentication that 
someone could brute force ... and then they'd also get access to a shell 
account and whatever else that person had authorization for.

-Joe


(and on that 'wow, I feel old' note ... it's been 10+ years since I've had to 
manage an LDAP server ... it's possible that they've gotten better about that 
issue since then)


Re: [CODE4LIB] Past Conference T-Shirts?

2014-11-06 Thread Joe Hourcle
On Nov 6, 2014, at 8:11 PM, Josh Wilson wrote:

> The Code4Lib version is clearly of superior quality, design, and
> provenance, but I actually thought this was an internet thing of unknown
> origin? e.g.,
> 
> http://www.cafepress.com/mf/17182533/metadata_tshirt
> http://www.redbubble.com/people/charlizeart/works/1280530-metadata?p=t-shirt
> 
> Perhaps a case of multiple discovery.

I don't know when I first saw it, but I know variations of the Helvetica shirt 
were first:

http://welovetypography.com/post/10993/

-Joe


ps.  being a font snob ... the Cafe Press shirt just has horrible kerning
 between the 'T' and 'A.  The Code4Lib one is better, but misses the
 little bevel on the 'T' to have it made up tight to the 'A', and the
 kerning between 'A' and 'D' could be a bit tighter.  The helvetica
 shirt I linked to clearly slid the letters around (as the  'T' has
 the beveled edge to mate up to the now-missing 'A'.)





> 
> On Thu, Nov 6, 2014 at 5:37 PM, Goben, Abigail  wrote:
> 
>> Joshua Gomez did the original, correct? http://wiki.code4lib.org/
>> index.php/2013_t-shirt_design_proposals
>> 
>> Thanks for working on this Riley! I know several people who will be very
>> happy to be able to purchase it.
>> 
>> 
>> On 11/6/2014 2:48 PM, Riley Childs wrote:
>> 
>>> Some one sent me the design, if you did it please let me know so I can
>>> give attribution.
>>> //Riley
>>> 
>>> Sent from my Windows Phone
>>> 
>>> --
>>> Riley Childs
>>> Senior
>>> Charlotte United Christian Academy
>>> Library Services Administrator
>>> IT Services Administrator
>>> (704) 537-0331x101
>>> (704) 497-2086
>>> rileychilds.net
>>> @rowdychildren
>>> I use Lync (select External Contact on any XMPP chat client)
>>> 
>>> From: todd.d.robb...@gmail.com
>>> Sent: ‎11/‎6/‎2014 3:41 PM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: Re: [CODE4LIB] Past Conference T-Shirts?
>>> 
>>> Joshua,
>>> 
>>> That is so gnarly!!!
>>> 
>>> On Thu, Nov 6, 2014 at 1:13 PM, Riley Childs 
>>> wrote:
>>> 
>>> Ok, will do, I didn't actually design it may take a little time while I
 dig though download folders from my backups, I will try and get to it
 next
 week
 //Riley
 
 
 --
 Riley Childs
 Senior
 Charlotte United Christian Academy
 IT Services Administrator
 Library Services Administrator
 https://rileychilds.net
 cell: +1 (704) 497-2086
 office: +1 (704) 537-0331x101
 twitter: @rowdychildren
 Checkout our new Online Library Catalog: https://catalog.cucawarriors.
 com
 
 Proudly sent in plain text
 
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Jason Stirnaman
 Sent: Thursday, November 06, 2014 2:46 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Past Conference T-Shirts?
 
 Riley,
 Could you fix the spelling on "More then just books" in the store? Should
 be "More than just books"
 
 Thanks,
 Jason
 
 Jason Stirnaman
 Lead, Library Technology Services
 University of Kansas Medical Center
 jstirna...@kumc.edu
 913-588-7319
 
 On Nov 6, 2014, at 1:04 PM, Riley Childs 
 wrote:
 
 Yes, but I have been unsuccessful thus far in getting a vector file/high
> 
 res transparent image.
 
> If you have one and can send please do so and I will put it up on the
> 
 code4lib store (code4lib.spreadshirt.com).
 
> Thanks
> //Riley
> 
> 
> --
> Riley Childs
> Senior
> Charlotte United Christian Academy
> IT Services Administrator
> Library Services Administrator
> https://rileychilds.net
> cell: +1 (704) 497-2086
> office: +1 (704) 537-0331x101
> twitter: @rowdychildren
> Checkout our new Online Library Catalog:
> 
 https://catalog.cucawarriors.com
 
> Proudly sent in plain text
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> 
 Goben, Abigail
 
> Sent: Thursday, November 06, 2014 1:10 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Past Conference T-Shirts?
> 
> My Metadata t-shirt, from C4L 2013, has been getting some
> 
 interest/requests of where others can purchase. I thought we'd talked
 about
 that here.  Was there a store ever finally set up that I could refer
 people
 to?
 
> Abigail
> 
> --
> Abigail Goben, MLS
> Assistant Information Services Librarian and Assistant Professor Library
> 
 of the Health Sciences University of Illinois at Chicago
 
> 1750 W. Polk (MC 763)
> Chicago, IL 60612
> ago...@uic.edu
> 
 
>>> 
>>> --
>>> Tod Robbins
>>> Digital Asset Manager, MLIS
>>> todrobbins.com | @todrobbi

Re: [CODE4LIB] Wednesday afternoon reverie

2014-11-06 Thread Joe Hourcle
On Nov 6, 2014, at 5:17 PM, Karen Coyle wrote:

> Cynthia, it's been a while but I wanted to give you feedback...
> 
> Ranking on importance based on library ownership and/or circulation is 
> something that I've seen discussed but not implemented -- mainly due to the 
> difficulty of gathering the data from library systems. But it seems like an 
> obvious way to rank results, IMO.
> 
> Too bad that one has to pay for BISAC headings. They tend to mirror the 
> headings in bookstores (and ebook stores) that people might be familiar with. 
> They capture fiction topics, especially, in a way to resonates with some 
> users (topics like "Teen Paranormal Romance").


I believe that they were created specifically for bookstores.

The problem is that the publishers (likely with support of the authors) get to 
decide where stuff should be filed.

As I help manage the Friend's bookstore at my local library branch, I've seen 
"Creation Science" (on an 'E' book with archealogists & dinosaur bones on the 
cover) and a few others make me cringe.

-Joe

ps.  I haven't seen "Teen Paranormal Romance" specifically as a heading 
(although yes, I've seen those books) ... I'm waiting for "Amish Paranormal 
Romance"  (although I don't know if "Amish Romance" is an official BISAC 
heading).

pps.  The nature of the BISAC headings make them less useful for determining if 
a book's actually on the shelves.  It's fine for general browsing, but it 
reminds me of the filing system from Black Books (from 0:40 to ~1:45):

https://www.youtube.com/watch?v=RZVDr4r9HEw




> On 10/22/14 1:25 PM, Harper, Cynthia wrote:
>> So I'm deleting all the Bisac subject headings (650_7|2bisacsh) from our 
>> ebook records - they were deemed not to be useful, especially as it would 
>> entail a for-fee indexing change to make them clickable.  But I'm thinking 
>> if we someday have a discovery system, they'll be useful as a means for 
>> broader-to-narrower term browsing that won't require translation to English, 
>> as would call number ranges.
>> 
>> As I watch the system slowly chunk through them, I think about how library 
>> collections and catalogs facilitate jumping to the most specific subjects, 
>> but browsing is something of an afterthought.
>> 
>> What if we could set a ranking score for the "importance" of an item in 
>> browsing, based on circulation data - authors ranked by the relative 
>> circulation of all their works, same for series, latest edition of a 
>> multi-edition work given higher ranking, etc.?  Then have a means to set the 
>> threshold importance value you want to look at, and browse through these 
>> general Bisac terms, or the classification?  Or have a facet for 
>> "importance" threshold.  I see Bisac sometimes has a broadness/narrowness 
>> facet ("overview") - wonder how consistently that's applied, enough to be 
>> useful?
>> 
>> Guess those rankings would be very expensive in compute time.
>> 
>> Well, back to the deletions.
>> 
>> Cindy Harper
>> Electronic Services and Serials Librarian
>> Virginia Theological Seminary
>> 3737 Seminary Road
>> Alexandria VA 22304
>> 703-461-1794
>> char...@vts.edu
> 
> -- 
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> m: +1-510-435-8234
> skype: kcoylenet/+1-510-984-3600


Re: [CODE4LIB] Stack Overflow

2014-11-04 Thread Joe Hourcle
On Nov 4, 2014, at 1:33 PM, Mark Pernotto wrote:

> I think all of this is really useful. I'd be lying if I said I didn't get a
> lot of great ideas and results from StackOverflow.
> 
> However, I've been burned quite a bit as well - deprecated code, inaccurate
> results, or just the wrong answer gets accepted. There seems to be such a
> push to 'accept as answer' that no one gives a second thought to
> alternative solutions. Because one size doesn't fit all - I think we all
> know that.

I hate it when I answer something 15-20 min after someone posts a
question, and they flag it as the 'correct' answer.  Someone else
might have some better response. *

I made the mistake of accepting an answer without fully testing it:

http://dba.stackexchange.com/q/30/51

Notice how no one else gave an alternative, as it works ... but I just
added the comment that the performance was much, much worse than when
I started.



... and we run into issues where what might have once been the correct
answer no longer is (because there's a new, better alternative, or
because some tool's no longer available (or not recommended because
of a horrible security flaw).


> I guess I'm trying to advocate not to rely on this type of resource
> completely when resolving your coding challenges.  While it can certainly
> be a tremendous learning tool, keep an objective mind for what tool best
> fits your institution's purpose.

What I'd like to see is some place where we can have the summary
of recommended practices for various problems ... lots of people
can contribute, and it can kept up-to-date.  Basically, a crowd
sourced FAQ.  The problem is, you can't just set up a wiki and
expect people to contribute.

Say what you will about StackExchange's herd-mentality about the
'right type of questions'**, their system gets people to contribute.



* for the people who complain about the grubbing for reputation:
  it's gamification.  I just hate the people who can manage to pop
  out reasonable sounding responses 10 min after the question was
  asked that are clearly just internet research because I *know*
  the answer is wrong. ... one person on Seasoned Advice was gaming
  the system; if you started downvoting their questions, they'd
  just delete them, but they were getting almost all of upvotes
  due to their 'early and plausible' strategy.

** Yet, I still have the 4th-rated question on Seasoned Advice
  for "Translating cooking terms between US / UK / AU / CA / NZ"
  ( http://cooking.stackexchange.com/q/784/67 ), simply because
  I got it in back when 'community wiki' was considered an option.
  Lots of other interesting questions have gotten closed as their
  community cracked down on 'em, though.  (eg, cookbook
  recommendations)


-Joe

ps.  Nothing frustrates me more than scouring the internet due to a
 problem you've run into ... and you *finally* find a 2 year old
 post on some forum that is the *exact* symptoms you have ... and
 you scroll through all of the replies of things you've already
 tried ... and get to the last post, from the person with the
 problem and they've posted 'nevermind, I fixed it'.


Re: [CODE4LIB] Stack Overflow

2014-11-04 Thread Joe Hourcle
On Nov 4, 2014, at 9:12 AM, Schulkins, Joe wrote:

> Presumably I'm not alone in this, but I find Stack Overflow a valuable 
> resource for various bits of web development and I was wondering whether 
> anyone has given any thought about proposing a Library Technology site to 
> Stack Exchange's Area 51 (http://area51.stackexchange.com/)? Doing a search 
> of the proposals shows there was one for 'Libraries and Information Science' 
> but this closed 2 years ago as it didn't reach the required levels during the 
> beta phase.

Some history on the Stack Exchange site:

1. Before 'Stack Exchange 2.0', they used to let other sites pay them to host 
Q&A sites.  There had been a library-focused site on Unshelved:

http://www.unshelved.com/2010-7-15

2. We got *hundreds* of people from Unshelved Answers to sign up on Area 51 ... 
but they wouldn't start up the site unless enough people with high enough 
reputation on existing 'Stack Exchange 2.0' sites expressed interest, claiming 
that they needed sufficient people with knowledge of the system.  I tried 
lobbying for them to count people w/ experience from Unshelved Answers, but 
they wouldn't do it.

3. It took over a year for the 'Libraries' proposal to get enough support to be 
accepted; by then, I assume most library folks had moved on.

4. They then named the site 'Library and Information Science', not 'Libraries'.

http://discuss.area51.stackexchange.com/q/3846/5710

   After my complaining, they changed it to 'Libraries and Information 
Science', but there was still a major problem:

5. As if all of the rest wasn't bad enough, we then had a bunch of non-library 
people closing answers because there wasn't a single definite answer, which was 
a large number of the questions on Unshelved Answers ... and most of the 
'example' questions were in that category as well:


https://web.archive.org/web/20120325030045/http://area51.stackexchange.com/proposals/12432/libraries-information-science



> The reason I think this might be useful is that instead of individual places 
> to go for help or raise questions (i.e. various mailing lists) there could be 
> a 'one-stop' shop approach from which we could get help with LMSs, discovery 
> layers, repository software etc. I appreciate though that certain vendors 
> aren't particularly open (yes, Innovative I'm looking at you here) and might 
> not like these things being discussed on an open forum.
> 
> Does anybody else think this might be useful? Would such a forum be shot down 
> by all the vendors legalese wrapped up in their Terms and Conditions? Or are 
> you happy with the way you go about getting help?


I think that the Stack Exchange culture & policies make it a bad fit for our 
community.  I think that yes, there is a need for such a site, but that the 
issues with immediately closing questions without a clear answer are a *huge* 
problem.  If questions were easily answered, we'd have done the research and 
answered it outselves (most of us have LIS degrees and know how to research 
things!).

You might also be able to get support from Unshelved again, and if we the 
community can put together a site, have them brand it as 'Unshelved Answers' 
again.

-Joe

ps.  I'm currently the moderator of OpenData.StackExchange.com; I was 
previously the moderator of Seasoned Advice (aka. cooking.stackexchange.com)

pps.  I also objected when they changed the name of the 'databases' proposal to 
'database administrators', which many of us felt narrowed the scope 
dramatically ( http://meta.dba.stackexchange.com/q/1/51 ; 
http://meta.dba.stackexchange.com/q/11/51 ).  I don't even bother with the site 
these days.


Re: [CODE4LIB] Terrible Drupal vulnerability

2014-10-31 Thread Joe Hourcle
On Oct 31, 2014, at 11:46 AM, Lin, Kun wrote:

> Hi Cary,
> 
> I don't know from whom. But for the heartbeat vulnerability earlier this 
> year, they as well as some other big providers like Google and Amazon were 
> notified and patched before it was announced. 

If they have an employee who contributes to the project, it's possible that this
was discussed on development lists before it was sent down to user level mailing
lists.  

Odds are, there's also  some network of people who are willing to give things a
cursory review / beta test in a more controlled manner before they're officially
released (and might break thousands of websites).  It would make sense that
companies who derive a good deal of their profits in supporting software would
participate in those programs, as well.

I could see categorizing either of those as 'ahead of the *general* public',
which was Kun's assertion.

-Joe



> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary 
> Gordon
> Sent: Friday, October 31, 2014 11:10 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Terrible Drupal vulnerability
> 
> How do they receive vulnerability report ahead of general public? From whom?
> 
> Cary
> 
> On Friday, October 31, 2014, Lin, Kun  wrote:
> 
>> If you are using drupal as main website, consider using Cloudflare Pro.
>> It's just $20 a month and worth it. They'll help block most attacks. 
>> And they usually receive vulnerability report ahead of general public.
>> 
>> Kun
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU 
>> ] On Behalf Of Cary Gordon
>> Sent: Friday, October 31, 2014 9:59 AM
>> To: CODE4LIB@LISTSERV.ND.EDU 
>> Subject: Re: [CODE4LIB] Terrible Drupal vulnerability
>> 
>> This is what I posted to the Drupal4Lib list:
>> 
>> 
>> 
>> By now, you should have seen https://www.drupal.org/PSA-2014-003 and 
>> heard about the "Drupageddon" exploits. and you may be wondering if 
>> you were vulnerable or iff you were hit by this, how you can tell and 
>> what you should do. Drupageddon affects Drupal 7, Drupal 8 and, if you 
>> use the DBTNG module, Drupal 6.
>> 
>> The general recommendation is that if you do not know or are unsure of 
>> your server's security and you did not either update to Drupal 7.32 or 
>> apply the patch within a few hours of the notice, you should assume 
>> that your site (and server) was hacked and you should restore 
>> everything to a backup from before October 15th or earlier. If your 
>> manage your server and you have any doubts about your file security, 
>> you should restore that to a pre 10/15 image, as well or do a reinstall of 
>> your server software.
>> 
>> I know this sounds drastic, and I know that not everyone will do that.
>> There are some tests you can run on your server, but they can only 
>> verify the hacks that have been identified.
>> 
>> At MPOW, we enforce file security on our production servers. Our 
>> deployments are scripted in our continuous integration system, and 
>> only that system can write files outside of the temporal file directory (e.g.
>> /sites/site-name/files). We also forbid executables in the temporal 
>> file system. This prevents many exploits related to this issue.
>> 
>> Of course, the attack itself is on the database, so even if the file 
>> system is not compromised, the attacker could, for example, get admin 
>> access to the site by creating an account, making it an admin, and 
>> sending themselves a password. While they need a valid email address 
>> to set the password, they would likely change that as soon as they were in.
>> 
>> Some resources:
>> https://www.drupal.org/PSA-2014-003
>> 
>> https://www.acquia.com/blog/learning-hackers-week-after-drupal-sql-inj
>> ection-announcement
>> 
>> http://drupal.stackexchange.com/questions/133996/drupal-sa-core-2014-0
>> 05-how-to-tell-if-my-server-sites-were-compromised
>> 
>> I won't attempt to outline every audit technique here, but if you have 
>> any questions, please ask them.
>> 
>> The takeaway from this incident, is that while Drupal has a great 
>> security team and community, it is incumbent upon site owners and 
>> admins to pay attention. Most Drupal security issues are only 
>> exploitable by privileged users, and admins need to be careful and 
>> read every security notice. If a vulnerability is publicly exploitable, you 
>> must take action immediately.
>> 
>> Thanks,
>> 
>> Cary
>> 
>> On Thu, Oct 30, 2014 at 5:24 PM, Dan Scott > > wrote:
>> 
>>> Via lwn.net, I came across https://www.drupal.org/PSA-2014-003 and 
>>> my heart
>>> sank:
>>> 
>>> """
>>> Automated attacks began compromising Drupal 7 websites that were not 
>>> patched or updated to Drupal 7.32 within hours of the announcement 
>>> of
>>> SA-CORE-2014-005
>>> - Drupal
>>>  core - SQL injection 
>>> 

Re: [CODE4LIB] More Metadata Questions

2014-10-30 Thread Joe Hourcle
On Oct 30, 2014, at 8:47 AM, P.G. wrote:

> Thanks for the early replies, here are more questions as some of you wanted
> to get more details.
> 
> What tools are available to create metadata?

define 'create metadata'.

We generally call it 'cataloging' or 'annotation' in my field, as we don't 
'create' anything -- we only document what already exists.


> What tools are available to check the structure of metadata?

It's a function of the metadata standard.  If they're in XML, they might have 
an XML schema that you can test against.  For FITS (the image file format, not 
the software for archiving) there's a website that you can give them a file, 
and they'll validate it for you.


> What tools are available to check the accuracy of metadata?

hehehehe.

Oh, wait, you were probably serious.

There are some things that you can validate easily (eg, if we know the size of 
an image, we can do simple consistency checks to make sure the size matches the 
height * width * bits/pixel) ... and there might be some statistical stuff that 
you can re-compute to validated (eg, mean, std.deviation, min, max, 10% and 90% 
thresholds).  And we might be able to ensure that things are consistent (within 
tolerance) if they're specified in more than one manner (eg, if we know the 
pixel scale, and the width in pixels, and the image width in arcseconds, we can 
verify that (pixel width * pscale == angular width).

But for the rest of it ... you're likely on your own.  We were serving browse 
images from a telescope where person who created the JPEGs forgot to check a 
flag to see if the spacecraft was upside down.  We didn't notice for months 
until I put together CGI for our website that would make a slideshow of images 
... and the sun was spinning backwards.  It was years before someone realized 
that another spacecraft's clock was drifting rather significantly, and that all 
of the times were bad.  We've got other images where the pointing information 
is known to be bad and you have to use an external catalog to correct them 
(because they don't want to re-process all of the files, and risk other 
corruption).


> Can't I just buy software that conforms to the Standard?

You can ... but standards change, tools implement standards wrong, and there 
are lots of standards.


> What is the standard metadata for images, audios and videos?

Do you mean 'what are the standards' or 'what concepts do we record about the 
items in our collection'?

There are a LOT of general categories of metadata (content, structure, 
archiving, context, etc.).  Just for images alone, there are two major 
competing standards for content (EXIF and ITPC).  Then there's Adobe's XMP, 
which allows you to put any metadata onto almost any type of file.  For 
astronomy images, you also want VAMP.  Structural metadata depends on the file 
format being used.  Archival and administrative metadata is typically not 
attached to the files themselves, but is a function of the archival system 
being used.

I don't deal with audio or videos, but I'm guessing that there are similar 
issues there.

-Joe


Re: [CODE4LIB] Subject: Re: Why learn Unix?

2014-10-28 Thread Joe Hourcle
On Oct 28, 2014, at 8:11 PM, Alex Berry wrote:

> And that is why alias rm='rm -I' was invented.


Do not *ever* set this to be a default for new users.

During my undergrad, I worked at helpdesk for the group that managed the 
computer labs, the general use unix & cms systems (not content management 
system ... an IBM mainframe ... one of the last bitnet-to-internet gateways).

The engineering school set up a bunch of default aliases for their systems... 
including "rm='rm -i'".

This meant that when people came to *our* servers ... they'd decide to 
interactively clean out their home directory by typing:

rm *

... and then wonder why it didn't prompt them.

... and then come to the computer lab to complain.  

... and then complain some more when we wouldn't immediately restore their 
files for them.  (our policy was technically disaster recovery only, but it was 
effectively disaster recovery, upper level management, or members of the 
faculty senate ... because restores from tape really, really sucked.)

-Joe


Re: [CODE4LIB] Why learn Unix?

2014-10-28 Thread Joe Hourcle
On Oct 28, 2014, at 10:07 AM, Joshua Welker wrote:

> There are 2 reasons I have learned/am learning Linux:
> 
> 1. It is cheaper as a web hosting platform. Not substantially, but enough to
> make a difference. This is a big deal when you are a library with a
> barebones budget or an indie developer (I am both).  Note that if you are
> looking for enterprise-level support, the picture is quite different.
> 
> 1a. A less significant reason is that Linux is much less resource-intensive
> on computers and works well on old/underpowered computers and embedded
> systems. If you want to hack an Android device or Chromebook to expand its
> functionality, Linux is what you want. I am running Ubuntu on my Acer C720
> Chromebook using Crouton, and now it has all the functionality of a
> full-fledged laptop at $200.

When I worked for an ISP in the late 1990s, our two FreeBSD servers that
handled *everything* were 75MHz Pentiums that another company had discarded
Our network admin bridged my apartment to his using a 386 w/ picoBSD install
that booted from a 3.5" floppy (to drive a WaveLAN card, before the days of
802.11)  I think one of the P75s was running fark.com for a while before
they added all of the commenting functionality.

It's amazing just how much functionality you can get out of hardware
that people have discarded by putting a system on it that doesn't
have a lot of cruft.


> 2. Many scripting languages and application servers were born in *nix and
> have struggled to port over to non-*nix platforms. For example, Python and
> Ruby both are a major pain to set up in Windows. Setting up a
> production-level Rails or Django server is stupidly overcomplicated in
> Windows to the point where it is probably easier just to use Linux. It's
> much easier to "sudo apt-get install" in Ubuntu than to spend hours tweaking
> environment variables and config files in Windows to achieve the same
> effect.

If you're going to run Python on windows, it used to be easier to download
a full 'WAMP' build (windows, apache, mysql, perl/php/python).  I don't
know what the current state of python installers are ... except for on
the Mac, where they're still a bit of a pain.

I have no idea on Ruby.


> I will go out on a limb here and say that *nix isn't inherently better than
> Windows except perhaps the fact that it is less resource-intensive (which
> doesn't apply to OSX, the most popular *nix variant). #1 and #2 above are
> really based on historical circumstances rather than any inherent
> superiority in Linux. Back when the popular scripting languages, database
> servers, and application servers were first developed in the 90s, Windows
> had  a very sucktastic security model and was generally not up to the task
> of running a server. Windows has cleaned up its act quite a bit, but the
> ship has sailed, at this point.
> 
> If you compare Windows today to Linux today, they are on very equal footing
> in terms of server features. The only real advantage Linux has at this point
> is that the big distros like Ubuntu have a much more robust package
> ecosystem that makes it much easier to install common server-side
> applications through the command line. But when you look at actually using
> and managing the OS, Linux is at a clear disadvantage. And if you compare
> the two as desktop environments, Windows wins hands-down except for a very
> few niche use cases. I say this as someone who uses a Ubuntu laptop every
> day.

For managing OSes, I admit that I haven't played with Windows 8, but I'm
still in the FreeBSD camp for servers.  (and not what Apple's done to it)

Windows might have an advantage if you're doing active directory w/
group policies, but I've heard horror stories from my neighbor about his
co-worker who decides to 'hide' his changes to individual people (eg,
blocking what websites they can get to), making it difficult for someone
else to go in and clear them out because he was too over-zealous.


> (Anyone who has read this far might be interested to know that Windows 10 is
> going to include an official MS-supported command line package management
> suite called OneGet that will build on the package ecosystem of the
> third-party Chocolatey suite.)

Very interesting.

-Joe


Re: [CODE4LIB] Why learn Unix?

2014-10-27 Thread Joe Hourcle
On Oct 27, 2014, at 12:38 PM, Bigwood, David wrote:

> Learning UNIX is fine. However, I do think learning SQL might be a better 
> investment. So many of our resources are in databases. Understanding 
> indexing, sorting and relevancy ranking of our databases is also crucial. 
> With linked data being all the rage knowing about sparql endpoints is 
> important.  The presentation of the information from databases under our 
> control  needs work. Is the information we present actionable or just strings?

Quite likely.  I wouldn't teach people SQL (and I've done plenty of pl/sql and 
t/sql programming) unless:

1. They had data they wanted to use that's already on an SQL server.
2. They had a (read-only) account on that server, so they could 
actually use it.

If they had to go about setting up a server (even if it's an installable 
application) and ingesting their data to be able to analyze, you can get 
frustrated before you even start to see any useful results.

If they have some scenario where they need multiple tables and joins, then 
sure, teach them SQL ... but over the years, I've had weeks of SQL-related 
training*, and I don't know that I'd want to make anyone go through all of that 
if they're just trying to do some simple reports that could be done in other 
ways.  I wouldn't even suggest teaching people about indexing until they've 
tried doing stuff in SQL and wondered why it's so slow.

Likewise, if there were some sort of non-SQL database for them to play with 
(even an LDAP server) that might have information of use to them, I'd teach 
them that first ... but I'd likely start w/ unix command line stuff (see below).


> Or maybe I just like those topics better and find the work being done there 
> fascinating?


Quite likely.  I still haven't found a reason good reason to wrap my head 
around sparql ... I guess in part because the stuff I'm dealing with isn't 
served as linked data.


...


On Oct 27, 2014, at 11:15 AM, Tod Olson wrote:

> There’s also something to be said for the Unix pipeline/filter model of 
> processing. That way of breaking down a task into small steps, wiring little 
> programs to filter the data for each step, building up the solution 
> iteratively, essentially a form of function composition. Immedidately, you 
> can do a lot of powerful one-off or scripting tasks right from the command 
> line. More generally, it’s a very powerful model to have in your head, can 
> transform your thinking.


I 100% agree.

If I were to try to teach "unix" to a group, I'd come up with some scenarios
where command like tools can actually help them, and show them how to automate
things that they'd have to do anyway.  (or tried to do, and gave up on).

For instance, if there's some sort of metric that they need, you can show
how simple `cut | sort | uniq | wc` can be used...

eg, if I have a 'common' or 'common+' webserver log file, I can get a quick
count of today's unique hosts via :

cut -d" " -f1 /var/log/httpd/access_log-2014.10.27 | sort | uniq | wc -l

If I wanted to see the top 10 hosts hitting us:

cut -d" " -f1 /var/log/httpd/access_log-2014.10.27 | sort | uniq -c | 
sort -rn | head -10

If you're lazy, and want to alias this so it didn't have to hard-code today's 
date:

cut -d" " -f1 `ls -1t /var/log/httpd/access_log* | head -1` | sort | 
uniq | wc -l

If your log files are rolled weekly, and we need to extract just today :
(note that it's easier if you're sure that something looking like today's date 
won't show up in requests)

cut -d" " -f1,4 `ls -1t /var/log/httpd/access_log* | head -1` | grep 
`date '+%d/%b/%Y'` | cut -d" " -f1 | sort | uniq | wc -l

If you just wanted a quick report of hits per day, and your log files aren't 
rolled and compressed:

cat `ls -1tr /var/log/httpd/access_log*` | cut -d\[ -f2 | cut -d: -f1 | 
uniq -c | more

(note that that last one isn't always clean ... the dates logged are when the 
request started, but they're logged when the script finishes, so sometimes 
you'll get something strange like:

12354 23/Oct/2014 
3 24/Oct/2014
1 23/Oct/2014
14593 24/Oct/2014

... but if you try to use `sort`, and you cross months, it'll sort of 
alphabetical, not cronological)

You could probably dedicate another full day to sed & awk, if you wanted ... or 
teach them enough perl to be dangerous.


-Joe


* I've taken all of the Oracle DBA classes back in the 8i days (normally 4 
weeks if taken as full-day classes), plus Oracle's data modeling and sql tuning 
classes (4-5 days each?)


Re: [CODE4LIB] CrossRef/DOI content-negotiation for metadata lookup?

2014-10-23 Thread Joe Hourcle
On Oct 23, 2014, at 11:19 AM, Jonathan Rochkind wrote:

> Hi, the DOI system supports some metadata lookup via HTTP content-negotiation.
> 
> I found this blog post talking about CrossRef's support:
> 
> http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html
> 
> But I know DataCite supports it to some extent too.
> 
> Does anyone know if there's overall registrar-agnostic documentation from DOI 
> for this service?

None that I'm aware of.  We've actually been discussing this issue in a 
breakout from 'Data Citation Implementors Group', and I think we're currently 
leaning towards not relying solely on content negotiation, but also using HTTP 
Link headers or HTML link elements to make it possible to discover the other 
formats that the metadata may be available.

If you dig into the OAI-ORE documentation, they specifically mention one of the 
problems of using Content Negotation is that you can't tell exactly what 
someone's asking for solely based on the Accept header ... do they want a 
resource map to the content, or just the metadata from the splash / landing 
page?



> Or, if there's kept-updated documentation from CrossRef and/or DataCite on it?

It looks like the one for CrossRef is :

http://www.crosscite.org/cn/

... if you go to the documentation for DataCite, it still has 'Beta' in the 
title:

"DataCite Content Resolver Beta"
http://data.datacite.org/static/index.html  
(note : http://data.datacite.org/ redirects here, which is linked from 
https://www.datacite.org/services )


> From that blog post, it says rdf+xml, turtle, and atom+xml should all be 
> supported as response formats.
> 
> But atom+xml seems to not be supported -- if I try the very example from that 
> blog post, I just get a 406 "No Acceptable Resource Available".
> 
> I am not sure if this is a bug, or if CrossRef at a later point than that 
> blog post decided not to support atom+xml. Anyone know how I'd find out, or 
> get more information?


The link I gave to CrossRef documentation has three e-mail addresses at the 
bottom, if you wanted to ask them if the documentation is still current*:

6 Getting Help
Please contact l...@crossref.org, t...@datacite.org or 
medrast...@cineca.it for support.


-Joe


* and this is why when I used to maintain documentation, every document had 
both 'last revised' and 'last reviewed' date on 'em, so you had a clue how 
likely they were to be out of date.


Re: [CODE4LIB] Citation Publication Tool

2014-10-22 Thread Joe Hourcle
On Oct 22, 2014, at 4:10 PM, Bigwood, David wrote:

> Any suggestions for publishing citations on the Web? We have a department 
> that has lots of publications with citations at the end of each. Keeping the 
> citations up-to-date is a chore.
> 
> Many here use Endnotes, and I know that can publish to the Web. Any examples 
> I can view? Would Libguides be something to consider? Any other suggestions 
> for easily getting different groups of citations up in multiple places?
> 
> Some examples of the pages involved:
> http://www.lpi.usra.edu/education/explore/LifeOnMars/resources/
> http://www.lpi.usra.edu/education/explore/solar_system/resources/
> http://www.lpi.usra.edu/education/explore/space_health/resources/


Based on the pages that you've linked to, I wouldn't call those 'citations'*

I've seen them called different things, depending on the reason for creating 
the lists, and the intended audience.

For instance, if they're lists of scholarly resources (books & journal 
articles, maybe presentations & thesis) that make use of your group's data, 
then it's either an 'Observatory Bibliography' or 'Telescope Bibliography' 
depending on the scope, and sometimes just 'Publication List'.  Those are 
actually an easy case in our field, as The Astrophysics Data System indexes the 
main journals in our field, so you just need software that can look up metadata 
from bibcodes:


http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=2004SPIE.5493..163H&data_type=BIBTEX&db_key=AST&nocookieset=1

'Publication Lists' are a little harder, as they often include "Public Press" 
coverage (ie, media intended for non-scientists such as newspaper / website / 
tv news / magazines), which ADS doesn't index.  (and you often want to grab a 
snapshot of them, in case it disappears)

For what you have ... although you have some links to formally published items, 
it looks to more be links to various websites with more information on a topic. 
 I've heard them informally referred to as "EPO Resource Pages" (EPO == 
Education & Public Outreach) or if specifically for teachers 'Educator Resource 
Pages".  I've typically seen them organized first by intended age level, then 
by the type of resource.  (organizing how you have it is generally for 
bean-counting when it comes time for senior reviews).

...

As for software recommendations ... if you're already using a CMS, I'd look to 
see if has any add-ons for managing either bibliographies or just lists of 
external links.  

If you're looking for stand alone software, I'd look for 'Reference Manager' or 
'Bibliography Manager' software that can generate HTML to post online.  There 
are some that allow you to manage everything online, but then you have to be 
worried about securing it** :

http://en.wikipedia.org/wiki/Comparison_of_reference_management_software

I'm not aware of any that have specifically been built for EPO purposes, but 
many of them have ways to add extra fields, so you could handle intended 
audience and your current classification that way.

-Joe

* There was actually an issue that came up during the work on the 'Joint 
Declaration of Data Citation Principles' that makes me believe that there are 
at least 6 different things that people may mean by 'citation', and yours would 
likely be a 7th.  See http://docs.virtualsolar.org/wiki/CitationVocabulary

** We had to drop the one we were using after a SQL injection, and my boss 
decided to ban all PHP on our network, so we rolled back to use 10+ year old 
software that had been written for another mission.


Re: [CODE4LIB] Linux distro for librarians

2014-10-19 Thread Joe Hourcle
On Oct 19, 2014, at 3:20 PM, Francis Kayiwa wrote:

[trimmed]

>  I'm willing to bet it would be much less effort to fix this Ubuntu problem 
> dealing with the Ubuntu devs (I've found them reasonable to work with) than 
> trying to heard the cats around "yet another debian fork"


Another alternative would be to pick an existing OS, and make sure that all of 
the requisite packages are in their package manager.

-Joe

ps.  'OS for librarians' was never defined as being (1) for servers at 
libraries, (2) for librarian workstations, or (3) for public-use machines.  
Things that make a good client machine doesn't always make for a good server.  
And what makes a good personally managed desktop doesn't necessarily make it a 
good desktop when you're managing dozens or hundreds.  (take MacOSX ... 
replacing bits to make it 'easier' for users, but harder to manage remotely in 
bulk)


[CODE4LIB] Citation hackathon tomorrow at PLOS

2014-10-17 Thread Joe Hourcle
I was just looking at the PLOS website, and noticed they had a banner:

PLOS is hosting a hackathon on Saturday, October 18th, 2014 at our SF 
office.

So, if you're in the San Francisco area, and are interested in citations (the 
theme of the hackathon), and don't have plans for tomorrow, see their website 
for more details and to RSVP:

http://www.ploslabs.org/citation-hackathon

-Joe


Re: [CODE4LIB] Requesting a Little IE Assistance

2014-10-14 Thread Joe Hourcle
It sounds like the issue already has a solution, but ...



On Oct 13, 2014, at 10:13 PM, Matthew Sherman wrote:

> The DSpace angle also complicates things a bit
> as they do not have any built in CSS that I could edit for this purpose.  I
> am hoping they will be amenable to the suggestions to right click and open
> in notepad because txt files are darn preservation friendly and readable
> with almost anything since they are some of the simplest files in
> computing.  Thanks for the input folks.


I'm not a DSpace user, but my understanding is that it's not a stand-alone
webserver ... which means that you may still have ways to re-write what
gets served out of it.

For instance, if you're running Apache you can build an 'output filter'.

I've only done them via mod_perl, but some quick research points to 
mod_ext_filter to call any command as a filter: 
http://httpd.apache.org/docs/2.2/mod/mod_ext_filter.html

You'd then set up a 'smart filter' to trigger this when you
had a text/plain response and the UserAgent is IE ... but the syntax
is ... complex, to put it nicely:

http://httpd.apache.org/docs/2.2/mod/mod_filter.html

(I've never configured a smart filter myself, and searching for
useful examples isn't really panning out for me).

... but I thought I'd mention this as an option for anyone who
might have similar problems in the future, as it lets you mess
with images and other types of content, too.

-Joe


Re: [CODE4LIB] Requesting a Little IE Assistance

2014-10-13 Thread Joe Hourcle
On Oct 13, 2014, at 5:15 PM, Kyle Banerjee wrote:

> You could encode it quotable-printable or mess with content disposition
> http headers.

Oh, please not quoted-printable.  That's=
the one that makes you think that something=
is wrong with your mail client because=
there are strange equals signs (=3D) all=
over the place.

-Joe


Re: [CODE4LIB] Requesting a Little IE Assistance

2014-10-13 Thread Joe Hourcle
On Oct 13, 2014, at 9:59 AM, Matthew Sherman wrote:

> For anyone who knows Internet Explore, is there a way to tell it to use
> word wrap when it displays txt files?  This is an odd question but one of
> my supervisors exclusively uses IE and is going to try to force me to
> reupload hundreds of archived permissions e-mails as text files to a
> repository in a different, less preservable, file format if I cannot tell
> them how to turn on word wrap.  Yes it is as crazy as it sounds.  Any
> assistance is welcome.

If there's a way to do it, it likely wouldn't be something that 
you could send from the server.

Depending on the web server that you're using, you might be able to
use client detection, and then pass requests from IE through a CGI
(or similar) that does the line-wrapping ... or wraps it in HTML.

If you go the HTML route, you might be able to just put the whole
thing in a  element.


If you *do* have to modify all of the text files, as you specifically
mention that they're e-mails, I'd recommend looking at 'flowed'
formatting, which uses 79 character lines, but SP CRLF to mark
'soft' returns:

https://www.ietf.org/rfc/rfc2646.txt

You could also try just setting an HTTP header to 'Format: Flowed'
and see if IE will handle it from there.  (I'd test myself, but
I don't have IE to test with)

-Joe


Re: [CODE4LIB] Informal survey regarding library website liberty

2014-09-02 Thread Joe Hourcle
On Sep 2, 2014, at 11:39 AM, Brad Coffield wrote:

> Hi all,
> 
> I would love to hear from people about what sort of setup they have
> regarding linkage/collaboration/constrictions/freedom regarding campus-wide
> IT practices and CMS usage and the library website.

[trimmed]


> I'm hoping that I can get some responses from you all that way I can
> informally say "of x libraries that responded y of them are not firmly tied
> to IT." (or something to that effect) I'm also very curious to read
> responses because I'm sure they will be educational and help me to make our
> site better.
> 
> THE QUESTION:
> 
> What kind of setup does your library have regarding servers, IT dept
> collaboration, CMS restrictions, anything else? I imagine that there are
> many unique situations. Any input you're willing to provide will be very
> welcome and useful.


So, rather than answer the question (as I don't work for a library), but
I worked in central IT for a university for ~7 years:

If you're going to consider using central IT for your infrastructure,
ask them what sort of service guarantees they're willing to provide.
This is typically called a 'Service Level Agreement', where they
spell out who's responsible for what, response times, acceptable
downtime / maintenance windows, etc.  It may include costs, but that
may be a separate document.

Typically, the hosted solutions are best when you've just got a few pages
that rarely get updated (once a year or so); if you're pulling info from a
database to display on a website, most shared solutions fall flat on their
face.  They might have a database where you could store stuff to make
data-driven web pages, but they rarely are flexible enough to interface
with some external server.

So, anyway ... it doesn't matter what other schools do if your IT dept.
can't provide the services you need.  If they *can* provide it, you need
to weight costs vs. level of service ... the cost savings may not be
worth it if they regularly take the server down for maintenance at times
when you need it.

-Joe


Re: [CODE4LIB] Hiring strategy for a library programmer with tight budget - thoughts?

2014-08-15 Thread Joe Hourcle
On Aug 15, 2014, at 2:49 PM, BWS Johnson wrote:

> Salvete!
> 
> 
>> My first thought was a project-based contract, too. But there are few
>> programmer projects that would require zero maintenance once finished. As
>> someone who has had to pick up projects "completed" by others, there 
>> are
>> always bugs, gaps in documentation, and difficult upgrade paths.
> 
> There could be follow up contracts for those problems, or they might be 
> less of a hassle for in house staff to handle than trying to do absolutely 
> errything from scratch.


That actually made me think of something -- 

I've worked in places where we've had issues with people brought in
as short-term contract developers.  The problem is ... the code was
crap.  As they didn't have to maintain it for the long run, they
wrote some really sloppy code.

I know of one group who brought someone in, they poo-pooed all of
the code, and insisted it had to be re-written (so they did ... in 
ksh ... without quoting anything ... and loading config files by
sourcing them)

... but of course, he was on an hourly contract, so he had a vested
interest in making more work for himself.  (and for me, as I was
then responsible for integrating their system w/ one that I maintain).

You also get cases where every change in the specs requires new
negotiation of payment.  (like the whole healthcare.gov thing)

...

so to sum up ... if you don't already have an established
relationship with the person, I'd avoid bringing in someone to
telework.

-Joe




>> So I have no solutions to offer. Enticing people with telework is a good
>> idea. It's disappointing to see libraries (and higher ed more generally)
>> continuing to not invest in software development. We need developers. If we
>> cannot find the money for them, perhaps we should re-evaluate our
>> (budgetary?) priorities.
>> 
> 
> 
> Anytime I see things which I think more than one Library would like to 
> have I think "Caw, innit that what a Consortium is for?" One member alone 
> might not be able to afford a swank techie, but perhaps pooling resources 
> across Libraries would let you hire someone at an attractive salary for the 
> long haul while getting all of the members' projects knocked out. It would 
> also mean that you don't have to do any of those nasty follow up contracts 
> since the person that made it would still be about.


I'm pretty sure that there was someone on this list a few years back who made a 
comment if every library contributed 10% of an FTE of funding, we could fund a 
lot of developers.


Re: [CODE4LIB] Hiring strategy for a library programmer with tight budget - thoughts?

2014-08-15 Thread Joe Hourcle
On Aug 15, 2014, at 12:44 PM, Kim, Bohyun wrote:

> I am in a situation in which a university has a set salary guideline for 
> programmer position classifications and if I want to hire an entry-lever dev, 
> the salary is too low to be competitive and if I want to hire a more 
> experienced dev in a higher classification, the competitive salary amount 
> exceeds what my library cannot afford. So as a compromise I am thinking about 
> going the route of posting a half-time position in a higher classification so 
> that the salary would be at least competitive. It will get full-time benefits 
> on a pro-rated basis. But I am wondering if this strategy would be viable or 
> not.
> 
> Also anyone has a experience in hiring a developer to telework completely 
> from another state when you do not have previous experience working with 
> her/him? This seems a bit risky strategy to me but I am wondering if it may 
> attract more candidates particularly when the position is half time.
> 
> As a current/past/future library programmer or hiring manager in IT or both, 
> if you have any thoughts, experience, or ideas, I would really appreciate it.


Salary's not the only factor when it comes to hiring ... convenience and work 
environment are a factor, too.

If I were you, I'd look to hire a half-time employee, and let them have 
flexible hours, so you could pick up a current student.  If you can offer them 
reduced tuition or parking (matters at some campuses ... for College Park, just 
getting 'em in a lot that's closer to their classes) might make up for a 
less-competitive salary.

You should also check with the university's legal department, as you have a 
class of students who specifically *can't* work full time (foreigners on 
student visas), so you might be able to hire a grad student that would've other 
problems getting hired.  Especially in the D.C. area, they have a hard time 
finding jobs (as so many companies are tied to the federal government, they 
don't want to hire non-US citizens).

...

As for the telework aspect -- it's a pain to get set up from nothing.  If you 
have someone that you're comfortable with and they move away, that's completely 
different from bringing in someone who doesn't have a vested relationship in 
the group.  At the very least, I'd recommend bring them in for an orientation 
period (2-8 weeks), where you can get a feel for their work ethic & such.

Most of the people on the project I'm on are remote ... but we keep an IM group 
chat window up all the time, and we have meetings 1-3 times per year where we 
all get together for a week to hash out various issues and keep the 
relationships strong.

-Joe


Re: [CODE4LIB] Dewey code

2014-08-11 Thread Joe Hourcle
On Aug 8, 2014, at 10:13 PM, Riley Childs wrote:

> Ok, so you want to access LC data to get Dewey decimal numbers? You need to 
> use a z39.50 client to pull the record, you can do it with marc edit but it 
> is labor intensive.  You would need to roll your own solution for this or use 
> classify.oclc.org to get book info (this doesn't give you API access). Your 
> best bet is classify.oclc.org.
> 
> That aside:
> Honestly you might be better off running with something like Koha, writing a 
> home brew library system is no cake walk, trust me I know from 2 years of 
> experience trying to code one and ultimately moving to koha. Koha can be run 
> on a VPS (Digital Ocean is what i would use) or on an old PC in the corner. I 
> am in a situation similar to yours if you want to contact me off list I can 
> give you some advice.


I 100% agree -- you'd be better off going with something intended for personal 
libraries (eg Delicious Library) and give it a dedicated machine before trying 
to roll your own.

oss4lib hasn't been updated in a while, but Lyrasis is maintaining foss4lib.org 
as a catalog of free & open source library software, and has a 'ILS feature 
comparison tool' which lists feature differences between Koha and Evergreen:

http://ils.foss4lib.org/

-Joe


Re: [CODE4LIB] very large image display?

2014-07-26 Thread Joe Hourcle
On Jul 25, 2014, at 11:36 AM, Jonathan Rochkind wrote:

> Does anyone have a good solution to recommend for display of very large 
> images on the web?  I'm thinking of something that supports pan and scan, as 
> well as loading only certain tiles for the current view to avoid loading an 
> entire giant image.
> 
> A URL to more info to learn about things would be another way of answering 
> this question, especially if it involves special server-side software.  I'm 
> not sure where to begin. Googling around I can't find any clearly good 
> solutions.
> 
> Has anyone done this before and been happy with a solution?


If you store the images in JPEG2000, you can pull tiles or different 
resolutions out via JPIP (JPEG 2000 Interactive Protocol)

Unfortunately, most web browsers don't support JPIP directly, so you have to 
set up a proxy for it.

For an example, see Helioviewer:

http://helioviewer.org/

Documentation and links to their JPIP server are available at:

http://wiki.helioviewer.org/wiki/JPIP_Server

-Joe


Re: [CODE4LIB] Publishing large datasets

2014-07-23 Thread Joe Hourcle
On Jul 23, 2014, at 5:29 PM, Kyle Banerjee wrote:

> We've been facing increasing requests to help researchers publish datasets.
> There are many dimensions to this problem, but one of them is applying
> appropriate metadata and mounting them so they can be explored with a
> regular web browser or downloaded by expert users using specialized tools.
> 
> Datasets often are large. One that we used for a pilot project contained
> well over 10,000 objects with a total size of about 1 TB. We've been asked
> to help with much larger and more complex datasets.
> 
> The pilot was successful but our current process is neither scalable nor
> sustainable. We have some ideas on how to proceed, but we're mostly making
> things up. Are there methods/tools/etc you've found helpful? Also, where
> should we look for ideas? Thanks,


The tools I use are too customized for our field to be of much use to anyone 
else, so can't help on that part of the question.


I'd really recommend trying to reach out to someone working in data informatics 
in the field that the data is from, as they would have recommendations on 
specific metadata that should be captured.


For the general 'data publication' community, it's coalescing, but still a bit 
all over the place.  Here are some of the ones that I know about:

JISC has a 'Data Publication' mailing list:

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=DATA-PUBLICATION

ASIS&T runs a 'Research Data Access & Preservation' conference and 
mailing list:

http://www.asis.org/rdap/
http://mail.asis.org/mailman/listinfo/rdap

... and they put most of the presentations up on slideshare:

http://www.slideshare.net/asist_org/

The Research Data Alliance has two working groups on the topic, 
Publishing Services and Publishing Data Workflows:

https://rd-alliance.org/group/rdawds-publishing-services-wg.html

https://rd-alliance.org/group/rdawds-publishing-data-workflows-wg.html


I'm also one of the moderators of the Open Data site on Stack Exchange, which 
has some questions that might be relevant:

Let's suppose I have potentially interesting data. How to distribute?
http://opendata.stackexchange.com/q/768/263

Benefits of using CC0 over CC-BY for data
http://opendata.stackexchange.com/q/26/263

... or just ask a new question.


I'd also recommend that when you catalog your data, that you also consider 
adding DataCite metadata, so that we can try to make it easier for others to 
cite your data.   (specific implementation recommendations for data citation 
are still evolving, but general principles have been released; if you have 
questions, feel free to ask me, as I think we need to add some clarification to 
what we mean on some of the items).

http://www.datacite.org/
https://www.force11.org/datacitation


As I see it, you're dealing with data that's in the problem range -- if it were 
larger, the department collecting the data would have a system in place 
already; if it were smaller, it's easier to manage as a single item for deposit.


-Joe


Re: [CODE4LIB] net.fun

2014-07-14 Thread Joe Hourcle
On Jul 14, 2014, at 5:25 PM, Lisa Rabey wrote:

> "The cause of the problem is:
> /dev/clue was linked to /dev/null"
> 
> Teehee.
> 
> http://pages.cs.wisc.edu/~ballard/bofh/bofhserver.pl


It's difficult to use the excuse 'solar flares' when your boss is (1) a solar 
physicist and (2) reads BOFH.

http://bofh.ntk.net/BOFH//bastard06.php


> On Mon, Jul 14, 2014 at 1:02 PM, Kyle Banerjee  
> wrote:
>> The only problem is that some people might have difficulty obtaining audio
>> modems that could be made to work with their cell phones...


So in um ... spring of 1995, I think it was ... we managed to get a car phone 
from Bell Atlantic (might've been Bell Atlantic-NYNEX at that point), and the 
phone had an RJ-11 jack on it.

... which of course meant that we had to see if we could hold up a modem 
connection while on the road.  Unfortunately, the best that we could manage to 
get was about 2400 baud for any extended periods.  We had our best transfer 
rates (9600 baud?) up near the NSA campus along the BW Parkway.

Mind you, this was in the days when modems topped out at 33.6k ('x2' and 
'kFlex' didn't come along 'til a year or two later, then finally v90) ... the 
modem banks we were dialing into might've only been 14.4k or 28.8k.

This was also in the days of analog cell service, as PCS didn't come out 'til 
even later ... once it did, the sysadmin for the ISP I worked at got cables so 
he could dial out from what today we'd call a 'netbook' (back then it was just 
a really tiny laptop ... this was also the days when you could keep a computer 
on your lap without it crushing you (the 'portable' aka 'luggable' era) and it 
burning your crotch (the current 'notebook' era).

... but I still think we could pull off 1200 baud w/ an acoustic coupler over a 
cell phone, which is about the bare minimum for MUDs in the mid 1990s, and 
would've been fine for BBSes, as long as you weren't dealing in warez.

-Joe

ps.  wow, this whole conversation is making me feel old ... doesn't help that I 
blew my back out last week, so I was already feeling old before the day started.


Re: [CODE4LIB] net.fun

2014-07-14 Thread Joe Hourcle
On Jul 14, 2014, at 11:56 AM, Riley Childs wrote:

> I know I might be little youn but code4lib needs a bbs

I can see it now ... someone re-writing TradeWars 2000 so you're an 
intergalactic bookmobile.

-Joe


Re: [CODE4LIB] net.fun

2014-07-14 Thread Joe Hourcle
On Jul 14, 2014, at 10:44 AM, Cary Gordon wrote:

> I remember when system administrators would change the MOTD daily. The '80s
> were so pastoral.

0 0 * * * /bin/fortune > /etc/motd

or, for those running Vixie cron (which most people weren't in the 80s) :

@daily /bin/fortune > /etc/motd


... but then, everyone went the way of 'web portals' and the like, rather than 
assuming everyone was going to be (telnet|tn3270)ing into a (unix|cms) system 
so they could check their e-mail, nntp, gopher, etc.

-Joe

ps. is it disturbing that the talk of motd is making me nostalgic for ASCII art?





> On Monday, July 14, 2014, Joe Hourcle  wrote:
> 
>> On Jul 14, 2014, at 8:21 AM, Riley Childs wrote:
>> 
>>> My MOTDs are not as fun...
>>> 
>>> RUN GET OUT OF HERE
>>> YOU ARE NOT WELCOME TODAY
>>> RESTRICTED ACCESS HERE.
>> 
>> I would expect that in the banner, not the motd:
>> 
>>$ more /etc/banner
>> 
>>This US Government computer is for authorized users only. By
>> accessing
>>this system you are consenting to complete monitoring with no
>>expectation of privacy. Unauthorized access or use may subject you
>> to
>>disciplinary action and criminal prosecution.
>> 
>> 
>> The banner gets displayed before the login prompt, the motd gets displayed
>> after ... there's also an assumption that the motd changes regularly, as
>> it's 'message of the day' ... although most people have it be completely
>> random and just call fortune or never bother changing it.
>> 
>> -Joe
>> 
> 
> 
> -- 
> Cary Gordon
> The Cherry Hill Company
> http://chillco.com


Re: [CODE4LIB] net.fun

2014-07-14 Thread Joe Hourcle
On Jul 14, 2014, at 8:21 AM, Riley Childs wrote:

> My MOTDs are not as fun...
> 
> RUN GET OUT OF HERE
> YOU ARE NOT WELCOME TODAY
> RESTRICTED ACCESS HERE.

I would expect that in the banner, not the motd:

$ more /etc/banner

This US Government computer is for authorized users only. By accessing  
this system you are consenting to complete monitoring with no  
expectation of privacy. Unauthorized access or use may subject you to  
disciplinary action and criminal prosecution.


The banner gets displayed before the login prompt, the motd gets displayed 
after ... there's also an assumption that the motd changes regularly, as it's 
'message of the day' ... although most people have it be completely random and 
just call fortune or never bother changing it.

-Joe


Re: [CODE4LIB] Why not Sharepoint?

2014-07-11 Thread Joe Hourcle
On Jul 11, 2014, at 10:33 AM, Thomas Kula wrote:

> On Fri, Jul 11, 2014 at 10:10:40AM -0400, Jacob Ratliff wrote:
>> Hi Ned,
>> 
>> The biggest case for SP is boiled down to 2 things in my mind.
>> 1) its terrible at preservation. If you are just using it as a digital
>> asset mgmt system its fine, but if you need the preservation component go
>> with something else.
> 
> I've never used Sharepoint, but really it boils down to coming up with a
> list of requirements for a digital preservation storage system:
> 
> - It must have an audit log of who did what to what when
> - It must do fixity checking of digital assets
>   - At minimum, it must tell you when a fixity check fails
>   - It really should be able to recover from fixity check
> failures when an object is read
>   - Ideally it should discover these *before* an object is
> accessed, recover, and notify someone
> - It must support rich enough metadata for your objects
> - It must meet your preservation needs (N copies distributed over
>   X distance within Y hours)
> - It must be scalable to handle anticipated future growth.
> 
> I'm sure there are more, I haven't had much coffee yet this morning so
> I'm missing some. And honestly, you have to scale your requirements to
> what your specific needs are.
> 
> *Only* then can you evaluate solutions. If you've got a list of
> requirements, you can then ask "I need this. How well does SP (or any
> other possible solution) meet this need?"

So it doesn't look like you're just coming up with cases that
Sharepoint doesn't do, you might consider something like the
TRAC checklist:

2007 version, from CRL:
http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf
2011 update from CCSDS:
http://public.ccsds.org/publications/archive/652x0m1.pdf

The 2011 update should mirror what's in ISO 16363.

Most of the other certifications that I've seen look more at the 
organization, and don't have specific portions for technology.

-Joe


ps.  A quick search for 'SharePoint' and 'OAIS' led me to :

http://www.eprints.org/events/or2011/hargood.pdf

... which as best I can tell is the abstract for a poster at OR2011.


Re: [CODE4LIB] Is ISNI / ISO 27729:2012 a name identifier or an entity identifier?

2014-06-20 Thread Joe Hourcle
On Jun 20, 2014, at 4:30 PM, Karen Coyle wrote:

> On 6/20/14, 11:38 AM, Richard Wallis wrote:
>> In what ways does ISNI support linked data?
>> 
>> See: http://www.isni.org/how-isni-works#HowItWorks_LinkedData
> 
> " accessible by a persistent URI in the form 
> isni-url.oclc.nl/isni/000134596520 (for example)  and soon also in the 
> form isni.org/isni/000134596520. "
> 
> Odd. I assume that whoever wrote that on their page just forgot the "http://"; 
> part of those strings. Right?

People think I'm being pedantic when I bitch about the protocol
missing for printed materials (flyers, business cards, etc) ...
but in this case, it's a definite violation of RFC 3986:

  1.1.1.  Generic Syntax
 Each URI begins with a scheme name, as defined in Section 3.1, that
 refers to a specification for assigning identifiers within that
 scheme.  As such, the URI syntax is a federated and extensible naming
 system wherein each scheme's specification may further restrict the
 syntax and semantics of identifiers using that scheme.

Now, it's possible that this whole "we don't need to bother with
http://"; thing has spilled into the CMS building community, and
they're actively stripping it out.  From their page, I think they're
using Drupal, but the horrible block of HTML that this was in is
blatantly MS Word's 'save as HTML' foulness:

  Linked 
Data
  Linked data is part of the ISNI-IA’s 
strategy to make ISNIs freely available and widely diffused.  Each 
assigned ISNI is accessible by a persistent URI in the form 
isni-url.oclc.nl/isni/000134596520 (for example)  and soon also in the 
form isni.org/isni/000134596520. 
  Coming soon:  ISNI core metadata 
in RDF triples.  The RDF triples will be embedded in the public web pages 
and the format will be available via the persistent URI and the SRU search 
API.
   

-Joe



>> 
>> On 20 June 2014 18:57, Eric Lease Morgan  wrote:
>> 
>>> On Jun 20, 2014, at 10:56 AM, Richard Wallis <
>>> richard.wal...@dataliberate.com> wrote:
>>> 
  authority control|simple identifier |Linked Data capability
 +-+--+--+
  VIAF   |X|X |  X   |
 +-+--+--+
  ORCID  | |X |  |
 +-+--+--+
   ISNI  |X|X |  X   |
 +-+--+--+
>>> 
>>> Increasingly I like linked data, and consequently, here is clarification
>>> and a question. ORCID does support RDF, but only barely. It can output
>>> FOAF-like data, but not bibliographic. Moreover, it is experimental, at
>>> best:
>>> 
>>>   curl -L -H 'accept: application/rdf+xml'
>>> http://orcid.org/-0002-9952-7800
>>> 
>>> In what ways does ISNI support linked data?
>>> 
>>> ---
>>> Eric Morgan
>>> 
>> 
>> 
> 
> -- 
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet


Re: [CODE4LIB] College Question!

2014-05-29 Thread Joe Hourcle
On May 28, 2014, at 11:17 PM, Riley Childs wrote:

> I was curious about the type of degrees people had. I am heading off to 
> college next year (class of 2015) and am trying to figure out what to major 
> in. I want to be a systems librarian, but I can't tell what to major in! I 
> wanted to hear about what paths people took and how they ended up where they 
> are now.

What paths we took?   Well, I'm in the mood for procrastinating, so here goes.

...

Mine started well before college.

My dad got our family a computer (Apple IIe) when I was in 3rd or 4th grade ... 
so I learned Basic back in the days when you'd copy program listings from 
magazines.

In middle school, I learned Logo, and in 8th grade was a aide for the computer 
lab.  One summer I went to a two week camp, and learned Pascal, and the 
difference between Basic and Basica.  During this time, my mom worked for a 
computer company, and we upgraded to a Apple ][gs.

My high school was a 'science and tech' school.  I had 2.5 years of drafting, 2 
years of commercial graphics, and by senior year I was working as a TA in the 
computer lab, and had an independent study in the school's print shop.  Through 
this time, we upgraded to a Macintosh SE/30 and then a Macintosh IIci.

For summers in high school, I was working as an intern for an office of the 
Department of Defense (my dad was military), and I learned a few other OSes, 
including ALIS (a window manager for Sun UNIX boxes).  I was also calling into 
BBSes quite regularly (had started back in middle school w/ a 1200 baud modem).

In college, I had planned to work towards a degree in Architectural 
Engineering, but my dad taught at a school that didn't offer it ... so I 
started a degree in Civil Engineering.

After my freshman year, I started working in the university's academic 
computing center.  (They managed the computer labs & the general use UNIX & CMS 
machines).  I started off doing general helpdesk support, but by my junior year 
that whole 'world wide web' thing was getting popular.

As I had experience with computer programming, databases, desktop publishing, 
graphics, etc ... so I ended up splitting my time between the helpdesk, and the 
newly formed 'web development team' ... which was two of us (both students), 
working half time.  And I was getting to be a fairly fast typist from mudding.

After my sophomore year, Tim, the other member of our 'web development team' 
graduated, and went to work full time, while I was half time.  We grew to four 
people (3 half time, as we were full time students), and we did some cutting 
edge stuff to get all of the university's course information online (required 
parsing quark xpress files to generate HTML, parsing dumps from the 
university's course registration system, and generating HTML, etc) ... and so 
Tim got offered a job to go work for Harvard.

Through this time, I helped out on the university's solar car team, and got 
distracted and never got around to switching to a school for architecture.

I ended up taking over in managing the university's web server while they tried 
to find a new manager for our group.  (this was back when 'webmaster' meant 
'web server administrator' and not 'person who designs web pages')  I learned 
Perl, to go along with the various shell scripting that I had already learned.  
I picked up the 'UNIX System Administration Handbook' and learned from our 
group's sysadmins until I was trusted to manage that server.

While all of this was going on, as I had taken enough classes to be 1/2 a 
semester off from my classmates, I never realized that I was supposed to take 
the EIT (Engineer in Training test) ... so I was a bit screwed if I wanted to 
be an engineer.  After graduation, I went to resign, as I wanted to look for a 
full time job, but the director said that they were putting in for a new 
position for me.

By the middle of summer, my new manager told me that the director had told her 
that under no circumstances was she to hire me for the job that was being 
created.  He really didn't like guys with long hair.

... but through this time, I spent some of my savings to help one of the folks 
on the mud to start an ISP  (so they could host the mud which was getting 
kicked out of the university it was at).  I was working as their webmaster, 
remotely.  After all of this crap went down at my university, I got offered to 
do some contract work at that ISP, so I moved out to Kentucky.  The first 
contract fell through, but I kept doing various coding projects for them, did 
tech support (phone and still the days when we'd drive out to people's houses 
to set up their modems).  I learned mysql in the process.

The contracting side of our company merged with another contracting company, 
but then everything fell through ... and oddly I was the only employee that 
suddenly found themselves working for a different company.  Through this time, 
I did mostly web & database work ... the ISP that I worked for

Re: [CODE4LIB] jobs digest for 2014-05-16

2014-05-16 Thread Joe Hourcle
On May 16, 2014, at 3:46 PM, Andreas Orphanides wrote:

> THIS IS SLIGHTLY DIFFERENT THAN WHAT WE DISCUSSED.

Agreed, but there's no need for shouting.

It looks to me like it's a change in the messages that 'jobs.code4lib.org' 
generates and sends to the list ... *not* the change that Eric made to the 
mailing list.

(I'm basing that on what a LISTSERV(tm) digest looks like, and the fact that 
it's archived this as a single message).

... and whoever made the change should at the very least put 'JOBS:' in the 
subject, so LISTSERV(tm) assigns it to the right topic for people to then 
ignore it.

-Joe




> 
> On Fri, May 16, 2014 at 3:44 PM,  wrote:
> 
>> Library Electronic Resources Specialist
>> Raritan Valley Community College
>> Branchburg Township, New Jersey
>> ColdFusion, EZproxy, JavaScript, Personal computer hardware
>> http://jobs.code4lib.org/job/13115
>> 
>> Digital Scholarship Specialist
>> University of Oklahoma
>> Norman, Oklahoma
>> Digital humanities, University of Oklahoma
>> http://jobs.code4lib.org/job/14593
>> 
>> Research Data Consultant
>> Virginia Polytechnic Institute and State University
>> Blacksburg, Virginia
>> Data curation, Data management, Digital library, Informatics
>> http://jobs.code4lib.org/job/14591
>> 
>> Systems Librarian
>> Central Michigan University
>> Mount Pleasant, Michigan
>> CONTENTdm, Ex Libris, Innovative Interfaces, MARC standards, Proxy server,
>> Resource Description and Access, SFX
>> http://jobs.code4lib.org/job/14590
>> 
>> To post a new job please visit http://jobs.code4lib.org/
>> 


Re: [CODE4LIB] separate list for jobs

2014-05-15 Thread Joe Hourcle

On Thu, 15 May 2014, Jodi Schneider wrote:


elm++


people still use elm?

I'm personally using the 'patterns-filters2' rule in alpine for managing 
my mailing lists.


I've considered switching to mutt, but I haven't used elm or its 
derivatives in over a decade.  (elm didn't have good MIME support, and I 
was getting tired of jumping through hoops for every attachment... 
although, it was *much* better than pine if you were connecting at 1200 
baud, as it didn't redraw the screen constantly)


-Joe




On Thu, May 15, 2014 at 6:09 PM, Eric Lease Morgan  wrote:


I have done my initial best to configure the mailing list to support a
jobs topic, and I've blogged about how you can turn off or turn on the jobs
listings. [1] From the blog:

  The Code4Lib community has also spawned job postings. Sometimes
  these job postings flood the mailing list, and while it is
  entirely possible use mail filters to exclude such postings,
  there is also "more than one way to skin a cat". Since the
  mailing list uses the LISTSERV software, the mailing list has
  been configured to support the idea of "topics", and through this
  feature a person can configure their subscription preferences to
  exclude job postings. Here's how. By default every subscriber to
  the mailing list will get all postings. If you want to turn off
  getting the jobs postings, then email the following command to
  lists...@listserv.nd.edu:

SET code4lib TOPICS: -JOBS

  If you want to turn on the jobs topic and receive the notices,
  then email the following command to lists...@listserv.nd.edu:

SET code4lib TOPICS: +JOBS

  Sorry, but if you subscribe to the mailing list in digest mode,
  then the topics command has no effect; you will get the job
  postings no matter what.

  Special thanks go to Jodi Schneider and Joe Hourcle who pointed
  me in the direction of this LISTSERV functionality. Thank you!

The LISTSERV topics feature is new to me, and I hope it works as
advertised. I think it will.

[1] blog posting - http://bit.ly/1nSCG2u

?
Eric Lease Morgan, Mailing List Owner





Re: [CODE4LIB] statistics for image sharing sites?

2014-05-14 Thread Joe Hourcle
On May 13, 2014, at 10:16 PM, Stuart Yeates wrote:

> On 05/14/2014 01:39 PM, Joe Hourcle wrote:
>> On May 13, 2014, at 9:04 PM, Stuart Yeates wrote:
>> 
>>> We have been using google analytics since October 2008 and by and large 
>>> we're pretty happy with it.
>>> 
>>> Recently I noticed that we're getting >100 hits a day from the 
>>> "Pinterest/0.1 +http://pinterest.com/"; bot which I understand is a 
>>> reasonably reliable indicator of activity from that site. Much of this 
>>> activity is pure-jpeg, so there is no HTML and no opportunity to execute 
>>> javascript, so google analytics doesn't see it.
>>> 
>>> pinterest.com is absent from our referrer logs.
>>> 
>>> My main question is whether anyone has an easy tool to report on this kind 
>>> of use of our collections?
>> 
>> Set your webserver logs to include user agent (I use 'combined' logs), then 
>> use:
>> 
>>  grep Pinterest /path/to/access/logs
>> 
>> You could also use any analytic tools that work directly off of your log 
>> files.  It might not have all of the info that the javascript analytics 
>> tools pull (window size, extensions installed, etc.), but it'll work for 
>> anything, not just HTML files.
> 
> When I visit http://www.pinterest.com/search/pins/?q=nzetc I see a whole lot 
> of our images, but absolutely zero traffic in my log files, because those 
> images are cached by pinterest.

You could also go the opposite route, and deny Pinterest your images, so they 
can't cache them.

You could either use robots.txt rules, or matching rules w/in Apache to deny 
their agents absolutely.

I have no idea if they'd then link straight to your images (so that you could 
get useful stats), or if they'd just not allow it to be used on their site at 
all.


-Joe


Re: [CODE4LIB] statistics for image sharing sites?

2014-05-13 Thread Joe Hourcle
On May 13, 2014, at 9:04 PM, Stuart Yeates wrote:

> We have been using google analytics since October 2008 and by and large we're 
> pretty happy with it.
> 
> Recently I noticed that we're getting >100 hits a day from the "Pinterest/0.1 
> +http://pinterest.com/"; bot which I understand is a reasonably reliable 
> indicator of activity from that site. Much of this activity is pure-jpeg, so 
> there is no HTML and no opportunity to execute javascript, so google 
> analytics doesn't see it.
> 
> pinterest.com is absent from our referrer logs.
> 
> My main question is whether anyone has an easy tool to report on this kind of 
> use of our collections?

Set your webserver logs to include user agent (I use 'combined' logs), then use:

grep Pinterest /path/to/access/logs

You could also use any analytic tools that work directly off of your log files. 
 It might not have all of the info that the javascript analytics tools pull 
(window size, extensions installed, etc.), but it'll work for anything, not 
just HTML files.


> My secondary question is whether any httpd gurus have recipes for redirecting 
> by agent string from low quality images to high quality. So when AGENT =  
> "Pinterest/0.1 +http://pinterest.com/"; and the URL matches a pattern redirect 
> to a different pattern. For example:
> 
> http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a%28w100%29.jpg
> 
> to
> 
> http://nzetc.victoria.ac.nz/etexts/MakOldT/MakOldTP022a.jpg


Perfectly possible w/ Apache's mod_rewrite, but you didn't say what http server 
you're using.

If Apache, you'd do something like:

RewriteCond %{HTTP_USER_AGENT} ^Pinterest
RewriteRule (^/etexts/MakOldT/.*)\(.*\)\.jpg $1.jpg [L]

You might need to adjust the regex to match your URLs ... I just assumed the 
stuff in parens got stripped out of stuff in that directory.


Re: [CODE4LIB] separate list for Jobs

2014-05-08 Thread Joe Hourcle
On May 8, 2014, at 3:54 PM, Coral Sheldon-Hess wrote:

> I have another, maybe minor, point to add to this: I've posted a job to
> Code4Lib, and I did it wrong. I have no idea how I'm supposed to make a job
> show up correctly, and now that I have realized I've done it wrong, I
> probably won't send another job to this list. (Or maybe I'll look it up in
> ... where? the wiki?)
> 
> A second list would make this a lot clearer, I think.


So, from my 'knowing way to much about LISTSERV(tm) brand
mailing lists, from having been the primary support person
at a university for a couple of a years:

There's another feature for 'sub-lists', where you can set
up parent/child relationships between lists ... so someone
you can have a separate address to send to for job postings
specifically:


http://www.lsoft.com/manuals/16.0/htmlhelp/list%20owners/StartingMailingLists.html#2337469



I've never tried it, but it might be possible to set the
SUBJECTHDR on the sub-list so the parent list assigns a topic
for a given sub-list.

-Joe


Re: [CODE4LIB] separate list for jobs

2014-05-08 Thread Joe Hourcle
On May 8, 2014, at 11:35 AM, Ben Brumfield wrote:

> I suspect I'm not the only mostly-lurker who subscribes to CODE4LIB in digest 
> mode, finding value in a glance over the previous day's discussions each 
> morning, then (very) occasionally weighing in on individual threads via the 
> web interface.  I find this to be more effective and efficient than 
> filtering-and-foldering individual messages, at least for my goal of  having 
> some idea of the content of the conversations here, although--not being a 
> full-time library technologist--I'm really just skimming.
> 
> I also suspect that I'm also not the only digest-mode subscriber who would 
> see value in a digest-mode option that excluded job postings.  


As this is an an actual LISTSERV(tm) mailing list, it's possible for the list 
owner to define 'topics', and then for people to set up their subscription to 
exclude those they wish to ignore:


http://www.lsoft.com/manuals/16.0/htmlhelp/list%20owners/ModeratingEditingLists.html#2338132

I would suspect it would be honored even in digest mode, but I've never tried 
it.

-Joe


Re: [CODE4LIB] separate list for discussing a separate list for jobs

2014-05-06 Thread Joe Hourcle
On May 6, 2014, at 12:34 PM, Dan Chudnov wrote:

> Is it time to reconsider:  should we start a separate list for "Job:" 
> postings?  "code4lib-jobs", perhaps?

I think the real question here is if we should have a separate list for 
discussing if we need a separate list for jobs.  I propose 
'code4lib-jobs-list-discuss'.

-Joe


Re: [CODE4LIB] CD auto-loader machine and/or services to rip CD's to disk

2014-04-30 Thread Joe Hourcle
On Apr 30, 2014, at 11:31 AM, Derek Merleaux wrote:

> I have few thousand CD's and DVD's of images scanned back in the days of
> more expensive server storage. I want the files on these transferred to a
> hard-drive or cloud storage where I can get at the them and sort out the
> keepers etc.
> I have seen a lot of great home-built auto-loader machines, but sadly do
> not have time/energy right now to build my own. Looking for recommendations
> for machines and/or for a reliable service who will take my discs and put
> them a server.


Summer interns.

Well, I guess it depends on just how many thousands it is.

I'm actually surprised that there aren't any groups renting these
sorts of things out -- most efforts like this (or film scanning,
book scanning, etc), are generally an effort that might run for
a year or two, and the gear isn't needed anymore.*

You'd think there'd be a market for folks to share the costs...
find three groups looking to do the scanning, share the up-front
costs and then pass it from place to place.

I think that IMLS has given grants for these sorts of efforts...
but if they could help match up equipment to groups that needed
it, they might be able to get better results for each dollar
spent.

-Joe

* Unless some item isn't discovered 'til later.


Re: [CODE4LIB] CFP: A Librarian's Introduction to Programming Languages

2014-03-27 Thread Joe Hourcle
On Mar 26, 2014, at 9:32 AM, Simon Spero wrote:

> I would structure the book by task, showing how different languages would
> implement the same task.
> 
> For example,
> 
> using a marc parsing library in java, groovy, python, ruby, perl,
> c/c++/objective c, Haskell.
> 
> Implementing same.
> 
> Using a rest API
> 
> Implementing a rest API
> 
> Doing statistical analysis of catalog records, circulation data , etc.
> 
> Doing knowledge based analysis of same
> --
> Treatment of each topic and language is likely to be cursory at best, and I
> am not sure who the audience would be.
> 
> A series of  " for librarians" books would seem more useful and
> easier to produce.


If you tried to put it all into a book, you'd have two issues:

1. It'd be horribly long.  (anyone remember the 'Encyclopedia
   of Graphical File Formats'?)

2. Tools change over time, and books don't.

... so instead, perhaps the code4lib community would want to try to
put some of these together on the code4lib wiki.  Eg, for the Marc one:

http://wiki.code4lib.org/index.php/Working_with_MARC

... people could contribute recipes of how they use the various
libraries that are linked in.  (or just say, look it's outdated, we
listed it, but we recommend (x) instead).

Think of it like a code golf challenge -- someone throws out a
problem, and members of the community (if they have the time)
submit their various solutions in different languages or using
different libraries.


... another possibility would be to organize something over
on stackexchange ... if you set some 'scoring criteria', we could
run them as 'code-challenges' on the codegolf site:

http://codegolf.stackexchange.com/questions/tagged/code-challenge

-Joe


Re: [CODE4LIB] Usability resources

2014-03-25 Thread Joe Hourcle
On Mar 25, 2014, at 4:07 PM, Coral Sheldon-Hess wrote:

> Some things that came up in the UX discussion (well, the third of it I was
> in) at the breakout session, about how to get your library to be more open
> to UX:

[trimmed, although, I agree on the Steve Krug books]

> I apologize for the self promotion, but not all libraries' cultures allow
> for the "big public test" approach. Mine ... might, now, but probably
> wouldn't have, a couple of years ago.


There's been a recommendation for years that big public tests are a 
waste of people's time ... you don't do that until it's effectively
a release candidate.

Here are the problems:

(1) there's going to be one or two problems that are the majority
of the problem reports.

(2) once everyone's tested out the buggy version, they're tainted
so can't be a clean slate when testing the next version.

Most recommendations that I've seen call for 3-5 testers for each
iteration, with 2-3 being preferred if you're doing fast cycles. [1]

Yes, you can run into the one tester with completely unreasonable
demands about how things should be done, but if your programmers
don't see how stupid the ideas are, they should be shown to be
horrible in the next test cycle.

If you run too large of tests, you've got to leave some long 
time window for people to test, someone has to correlate all of
the comments ... it's just a drag.  Small test groups mean you
can run a day of testing once a week and keep moving forward.

-Joe

[1] I'll probably out myself as an old fogey here, but :
http://www.useit.com/articles/why-you-only-need-to-test-with-5-users/


Re: [CODE4LIB] CFP: A Librarian's Introduction to Programming Languages

2014-03-25 Thread Joe Hourcle
On Mar 25, 2014, at 9:03 AM, Miles Fidelman wrote:

> Come to think of it, there's nothing there to frame the intent and scope of 
> the book - is it aimed at librarians who write code, or at librarians who are 
> trying to guide people to topical material?

An excellent question, so I'm cc'ing the editors for the book, so maybe they 
can answer.

(I suspect by the languages listed that it's the first one; the second would be 
so broad that it might not be useful ... I'm having a difficult time coming up 
with justifications for using Logo, IDL or Brainfuck in a library [1]).  And 
the mention of "how a specific language can be used to enhance library services 
and resources" might be a clue, too)


> Either way, it sure seems like at least three framing topics are missing:
> - a general overview of programming language types and characteristics (i.e., 
> context for reading the other chapters)
> - a history of programming languages (the family tree, if you will)
> - programming environments, platforms, tools, libraries and repositories - a 
> language's ecosystem probably influences choice of language use as much as 
> the language itself

Agreed on all three ... in some cases, the main justification for using a 
language is the ecosystem (eg, CPAN for Perl).

In some cases, it might be worth just assuming a library -- eg, do you want to 
teach people (ECMA|J(ava)?|Live)Script, or just assume jQuery, so they can get 
up to speed faster?  (yes, I know, you then bring in the jQuery vs. MooTools 
vs. every other JS library, but I think it's safe to say that jQuery is a 
defacto standard these days)


> - "non-language languages" - e.g., sql/nosql, spreadsheet macros and other 
> platforms that one builds on

Agreed on the need for SQL.  NoSQL isn't really a language on its own; I'm not 
aware of any specific general API, so I'd go with XPath & XSLT for discussing 
non-relational data.  Macro languages would be useful (and I'd assume the 
'Basic' proposal was actually for VBA, so you could create more complex MS 
Access databases)

-Joe


[1] okay, maybe Logo in the context of MakerSpaces, but still nothing on the 
other two.

ps.  I haven't trimmed this, so the editors can see some of the other comments 
made.



> Miles Fidelman
> p.s. I wrote a book for ALA Editions, they were great to work with.  The 
> acquisitions editor I worked with is now a Sr. Editor, so I expect they're 
> still good folks to work with.
> 
> Jason Bengtson wrote:
>> I'm also surprised not to see anything about the sql/nosql end of the 
>> equation. Integral to a lot of apps and tools . . . at least from a web 
>> perspective (and probably from others too).
>> 
>> Best regards,
>> 
>> Jason Bengtson, MLIS, MA
>> Head of Library Computing and Information Systems
>> Assistant Professor, Graduate College
>> Department of Health Sciences Library and Information Management
>> University of Oklahoma Health Sciences Center
>> 405-271-2285, opt. 5
>> 405-271-3297 (fax)
>> jason-bengt...@ouhsc.edu
>> http://library.ouhsc.edu
>> www.jasonbengtson.com
>> 
>> NOTICE:
>> This e-mail is intended solely for the use of the individual to whom it is 
>> addressed and may contain information that is privileged, confidential or 
>> otherwise exempt from disclosure. If the reader of this e-mail is not the 
>> intended recipient or the employee or agent responsible for delivering the 
>> message to the intended recipient, you are hereby notified that any 
>> dissemination, distribution, or copying of this communication is strictly 
>> prohibited. If you have received this communication in error, please 
>> immediately notify us by replying to the original message at the listed 
>> email address. Thank You.
>> 
>> On Mar 25, 2014, at 7:39 AM, Ian Ibbotson  wrote:
>> 
>>> Going in the other direction from cobol and fortran -Fair warning - Putting
>>> on java evangelist hat- :) I wonder if it might be worth suggesting to the
>>> authors that they change java into "JVM Languages" and cover off Java,
>>> Scala, Groovy,...(others). We've had lots of success in the GoKB(
>>> http://gokb.org/) and KB+(https://www.jisc-collections.ac.uk/News/kbplus/)
>>> Knowledge Base projects using groovy on grails - Essentially all the
>>> pre-built libraries and enterprise gubbins of Java, but with a more
>>> ruby-esq idiom making it much more readable / less verbose / more
>>> expressive, and integrating nicely with all that existing enterprise
>>> infrastructure to boot.
>>> 
>>> The use of embedded languages in JVMs (Including javascript) means that the
>>> use of Domain Specific Languages are becoming more and more widespread
>>> under JVMs, and this seems (To me) an area where there is some real
>>> advantage to having practitioners with real coding skills - Maybe not the
>>> hardcore systems development stuff but certainly ability to tune and
>>> configure software. Expressing things like business rules in DSLs (EG How
>>> to choose a supplier for an item, or how to ded

Re: [CODE4LIB] Job: PERL PROGRAMMER at The Center for Research Libraries

2014-03-10 Thread Joe Hourcle
On Mar 10, 2014, at 12:19 PM, Lisa Rabey wrote:

> On Mon, Mar 10, 2014 at 11:46 AM, Joe Hourcle
>  wrote:
>> For those looking to hire a Perl programmer, two suggestions:
>> 
>> 1. Don't put it in all caps:
>> 
>>http://www.perl.org/about/style-guide.html
> 
> 
> This is a fair point if they only all-capped Perl, which they didn't;
> they capped the title of the job. I'm assuming they did this for
> formatting reasons in the email, which should have no barring in
> "who's dabbling" in the community and who is not.

And although I'm normally a fan of trimming down message test
to the relevant parts, you conveniently removed the other
three occurrences of 'PERL' in the posting:

"We are seeking a PERL Programmer to work ..."

"Minimum of 1 year of PERL programming experience"

"Experience using PERL, JAVA or other ..."



> But it also raises the point that if I were a Perl programmer, someone
> nitpicking about email formatting is probably not someone I would want
> to work with.

Right ... I should apologize for top-posting in my last message.
I'm sorry, and I'll try not to do it again.

Thank you for not continuing the top-posting in your reply.

-Joe


Re: [CODE4LIB] Job: PERL PROGRAMMER at The Center for Research Libraries

2014-03-10 Thread Joe Hourcle
For those looking to hire a Perl programmer, two suggestions:

1. Don't put it in all caps:

http://www.perl.org/about/style-guide.html

2. Make sure you post on the Perl jobs board:

http://jobs.perl.org/

-Joe

ps. I have no idea how the Java folks like their language
capitalized, but I suspect it's similar.

pps. On the plus side, it makes it really easy to weed out
 resumes of who's only dabbling and not active in the
 community.


On Mar 10, 2014, at 11:35 AM, j...@code4lib.org wrote:

> PERL PROGRAMMER
> The Center for Research Libraries
> Chicago
> 
> Center for Research Libraries (CRL) is a membership
> consortium consisting of the leading academic and research libraries in the
> U.S. and abroad, with a unique and historic collection. A
> recently awarded grant from the Andrew W. Mellon Foundation has enabled the
> CRL to continue and expand its efforts to shape a data-centered international
> strategy for archiving and digitizing historical journals and newspapers.
> 
> 
> We are seeking a PERL Programmer to work with our existing team of librarians
> to further develop and maintain data projects critical to meeting our
> objective. Work primarily involves analyzing and
> manipulating data sets from library and commercial sources to pull out needed
> data and transform it into additional formats for ingest into existing
> databases or tools used for presentation of the data.
> 
> 
> Duties and Responsibilities:
> 
> • Working closely with librarians to analyze and manipulate data sets
> 
> • Creating optimized scalable code
> 
> • Design, build and test tools to analyze data, extract patterns, and
> transform data among various formats as required by project demands.
> 
> • Design and build user-friendly interface for tools.
> 
> 
> Requirements:
> 
> • Strong analytical skills, with experience analyzing dataflow, data patterns
> and work flow
> 
> • Minimum of 1 year of PERL programming experience
> 
> • Experience using PERL, JAVA or other programming languages to normalize text
> and applying API's to harvest or capture data.
> 
> • Ability to collaborate and contribute to a team and work independently
> 
> • Ability to document and explain standards
> 
> • Related degree required
> 
> 
> In addition to professional challenge and the chance to make a creative
> contribution, the CRL offers a competitive salary and exceptional benefits
> package.
> 
> 
> Respond with the title of the position in the subject line
> to: resu...@crl.edu. You may also respond by mail or fax,
> indicating the position you are applying for to:
> 
> 
> Human Resources
> 
> Center for Research Libraries
> 
> 6050 S. Kenwood Ave.
> 
> Chicago, IL 60637
> 
> Fax: 773-955-4545
> 
> 
> An Equal Opportunity Employer m/f/d/v
> 
> 
> 
> Brought to you by code4lib jobs: http://jobs.code4lib.org/job/12932/


Re: [CODE4LIB] Book scanner suggestions redux

2014-03-04 Thread Joe Hourcle
On Mar 3, 2014, at 10:54 AM, Aaron Rubinstein wrote:

> Hi all, 
> 
> We’re looking to purchase a book scanner and I was hoping to get some 
> recommendations from those who’ve had experience.

I don't have experience, but a couple of years back, a group started selling 
kits to make book scanners:

http://diybookscanner.myshopify.com/products/diy-book-scanner-kit


It's $500+shipping, and missing some parts (glass, cameras, paint), but it 
means that instead of carpentry skills, you just need experience assembling 
things.

-Joe


Re: [CODE4LIB] online book price comparison websites?

2014-02-26 Thread Joe Hourcle
On Feb 26, 2014, at 3:14 PM, Jonathan Rochkind wrote:

> Anyone have any recommendations of online sites that compare online prices 
> for purchasing books?
> 
> I'm looking for recommendations of sites you've actually used and been happy 
> with.
> 
> They need to be searchable by ISBN.
> 
> Bonus is if they have good clean graphic design.
> 
> Extra bonus is if they manage to include shipping prices in their price 
> comparisons.


Might be too late, but :

http://isbn.nu/

It doesn't include the shipping prices in their results, though.

API is just appending the ISBN to the end, either 9 or 13 :

http://isbn.nu/0060853980
http://isbn.nu/9780060853983

-Joe


Re: [CODE4LIB] how to unsubscribe this list?

2014-02-20 Thread Joe Hourcle
On Feb 20, 2014, at 9:14 AM, ya'aQov wrote:

> ​please send to yaaq...@gmail.com
> kind thanks.​


To everyone trying to unsubscribe from mailing lists --

Look at the full SMTP headers, as most mail exploders these days include 
something in there, if they don't attach a .sig with it.

So for code4lib :

List-Help: ,
  
List-Unsubscribe: 
List-Subscribe: 
List-Owner: 
List-Archive: 

-Joe


Re: [CODE4LIB] Question about OAI Harvesting via Perl

2014-01-14 Thread Joe Hourcle
On Jan 14, 2014, at 3:01 PM, Eka Grguric wrote:

> Hi,
> 
> I am a complete newbie to Perl (and to Code4Lib) and am trying to set up a 
> harvester to get complete metadata records from oai-pmh repositories. My 
> current approach is to use things already built as much as possible - 
> specifically the Net::Oai::Harvester 
> (http://search.cpan.org/~esummers/OAI-Harvester-1.0/lib/Net/OAI/Harvester.pm).
>  The code I'm using is located in the synopsis and specific parts of it seem 
> to work with some samples I've tried. For example, if I submit a request for 
> a list of sets to the oai url for arXiv.org (http://arXiv.org/oai2) I get the 
> correct list.
> 
> The error I run into reads "can't call listRecords() on an undefined value in 
> *filename* line *#*". listRecords() seems to have been an issue in past 
> iterations but I'm not sure how to get around it. 
> 
> At the moment it looks like this: 
> ## list all the records in a repository
> my $list = $harvester->listRecords(
>   metadataPrefix = 'oai_dc'
>);
> 
> Any help (or Perl resources) would be appreciated!

The error message you're getting is a sign that '$harvester' (the item that you 
tried calling 'listRecords' on) hasn't been set up properly.

The typical scenarios are that either the object was never called to be created 
or when you tried to create it the function returned undef (undefined value) to 
indicate that something had gone wrong.

How did you initialize it?

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Joe Hourcle
On Dec 2, 2013, at 1:25 PM, Kevin Ford wrote:

> > A key (haha) thing that keys also provide is an opportunity
> > to have a conversation with the user of your api: who are they,
> > how could you get in touch with them, what are they doing with
> > the API, what would they like to do with the API, what doesn’t
> > work? These questions are difficult to ask if they are just a
> > IP address in your access log.
> -- True, but, again, there are other ways to go about this.
> 
> I've baulked at doing just this in the past because it reveals the raw and 
> primary purpose behind an API key: to track individual user usage/access.  I 
> would feel a little awkward writing (and receiving, incidentally) a message 
> that began:
> 
> --
> Hello,
> 
> I saw you using our service.  What are you doing with our data?
> 
> Cordially,
> Data service team
> --

It's better than posting to a website:

We can't justify keeping this API maintained / available,
because we have no idea who's using it, or what they're
using it for.

Or:

We've had to shut down the API because we'd had people
abusing the API and we can't easily single them out as
it's not just coming from a single IP range.

We don't require API keys here, but we *do* send out messages
to our designated community every couple of years with:

If you use our APIs, please send a letter of support
that we can include in our upcoming Senior Review.

(Senior Review is NASA's peer-review of operating projects,
where they bring in outsiders to judge if it's justifiable to
continue funding them, and if so, at what level)


Personally, I like the idea of allowing limited use without
a key (be it number of accesses per day, number of concurrent
accesses, or some other rate limiting), but as someone who has
been operating APIs for years and is *not* *allowed* to track
users, I've seen quite a few times when it would've made my
life so much easier.



> And, if you cringe a little at the ramifications of the above, then why do 
> you need user-specific granularity?   (That's really not meant to be a 
> rhetorical question - I would genuinely be interested in whether my notions 
> of "open" and "free" are outmoded and based too much in a theoretical purity 
> that unnecessary tracking is a violation of privacy).

You're assuming that you're actually correlating API calls
to the users ... it may just be an authentication system
and nothing past that.


> Unless the API key exists to control specific, user-level access precisely 
> because this is a facet of the underlying service, I feel somewhere in all of 
> this the service has violated, in some way, the notion that it is "open" 
> and/or "free," assuming it has billed itself as such.  Otherwise, it's "free" 
> and "open" as in Google or Facebook.

You're also assuming that we've claimed that our services
are 'open'.  (mine are, but I know of plenty of them that
have to deal with authorization, as they manage embargoed
or otherwise restricted items).

Of course, you can also set up some sort of 'guest'
privileges for non-authenticated users so they just wouldn't
see the restricted content.


> All that said, I think a data service can smooth things over greatly by not 
> insisting on a developer signing a EULA (which is essentially what happens 
> when one requests an API key) before even trying the service or desiring the 
> most basic of data access.  There are middle ground solutions.

I do have problems with EULAs ... one in that we have to
get things approved by our legal department, second in that
they're often written completely one-sided and third in
that they're often written assuming personal use.

Twitter and Facebook had to make available alternate EULAs
so that governments could use them ... because you can't
hold the person who signed up for the account responsible
for it.  (and they don't want it 'owned' by that person
should they be fired, etc.)

... but sometimes they're less restrictive ... more TOS
than EULA.  Without it, you've got absolutely no sort of
SLA ... if they want to take down their API, or block you,
you've got no recourse at all.

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 11:12 PM, Simon Spero wrote:

> On Dec 1, 2013 6:42 PM, "Joe Hourcle"  wrote:
> 
>> So that you don't screw up web proxies, you have to specify the 'Vary'
> header to tell which parameters you consider significant so that it knows
> what is or isn't cacheable.
> 
> I believe that if a Vary isn't specified, and the content is not marked as
> non cachable,  a cache must assume Vary:*, but I might be misremembering

That would be horrible for caching proxies to assume that nothing's
cacheable unless it said it was.  (as typically only the really big
websites or those that have seen some obvious problems bother with
setting cache control headers.)

I haven't done any exhaustive tests in many years, but I was noticing
that proxies were starting to cache GET requests with query strings,
which bothered me -- it used to be that anything that was an obvious
CGI wasn't cached.  (I guess that enough sites use it, it has to make
the assumption that the sites aren't stateful, and that the parameters
in the URL are enough information for hashing)



>> (who has been managing web servers since HTTP/0.9, and gets annoyed when
> I have to explain to our security folks each year  why I don't reject
> pre-HTTP/1.1 requests or follow the rest of  the CIS benchmark
> recommendations that cause our web services to fail horribly)
> 
> Old school represent (0.9 could out perform 1.0 if the request headers were
> more than 1 MTU or the first line was sent in a separate packet with nagle
> enabled). [Accept was a major cause of header bloat].

Don't even get me started on header bloat ... 

My main complaint about HTTP/1.1 is that it requires clients to support
chunked encoding, and I've got to support a client that's got a buggy
implementation.  (and then my CGIs that serve 2GB tarballs start
failing, and it's calling a program that's not smart enough to look
for SIG_PIPE, so I end up with a dozen of 'em going all stupid and
sucking down CPU on one of my servers)

Most people don't have to support a community written HTTP client,
though.  (and the one alternative HTTP client in IDL doesn't let me
interactive  w/ the HTTP headers directly, so I can't put a wrapper
around it to extract the tarball's filename from the Content-Disposition
header)

-Joe

ps.  yep, still having writer's block on posters.


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 9:36 PM, Barnes, Hugh wrote:

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe 
> Hourcle
> 
>>> (They are on Wikipedia so they must be real.)
> 
>> Wikipedia was the first place you looked?  Not IETF or W3C?
>> No wonder people say libraries are doomed, if even people who work in 
>> libraries go straight to Wikipedia.
> 
> It was a humorous aside, regrettably lacking a smiley.

Yes, a smiley would have helped.

It also doesn't help that there used to be a website out there
named 'ScoopThis'.  They started as a wrestling parody site, but
my favorite part was their advice column from 'Dusty the Fat,
Bitter Cat'.

I bring this up because their slogan was "cuz if it’s on the net,
it’s got to be true" ... so I twitch a little whenever someone
says something similar to that phrase.

(unfortunately, the site's gone, and archive.org didn't cache
them, so you can't see the photoshopped pictures of Dusty
at Woodstock '99 or the Rock's cooking show.  They started up
a separate website for Dusty, but when they closed that one
down, they put up a parody of a porn site, so you probably
don't want to go looking for it)


> I think that comment would be better saved to pitch at folks who cite and 
> link to w3schools as if authoritative. Some of them are even in libraries.

Although I wish that w3schools would stop showing up so highly
in searches for javascript methods & css attributes, they
did have a time when they were some of the best tutorials out
there on web-related topics.  I don't know if I can claim that
to be true today, though.


> Your other comments were informative, though. Thank you :)

I try ... especially when I'm procrastinating on doing posters
that I need to have printed by Friday.

(but if anyone has any complaints about data.gov or other
federal data dissemination efforts, I'll be happy to work
them in)

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 7:57 PM, Barnes, Hugh wrote:

> +1 to all of Richard's points here. Making something easier for you to 
> develop is no justification for making it harder to consume or deviating from 
> well supported standards.
> 
> [Robert]
>> You can't 
>> just put a file in the file system, unlike with separate URIs for 
>> distinct representations where it just works, instead you need server 
>> side processing.
> 
> If we introduce languages into the negotiation, this won't scale.

It depends on what you qualify as 'scaling'.  You can configure
Apache and some other servers so that you pre-generate files such
as :

index.en.html
index.de.html
index.es.html
index.fr.html

... It's even the default for some distributions.

Then, depending on what the Accept-Language header is sent,
the server returns the appropriate response.  The only issue
is that the server assumes that the 'quality' of all of the
translations are equivalent.

You know that 'q=0.9' stuff?  There's actually a scale in
RFC 2295, that equates the different qualities to how much
content is lost in that particular version:

  Servers should use the following table a guide when assigning source
  quality values:

 1.000  perfect representation
 0.900  threshold of noticeable loss of quality
 0.800  noticeable, but acceptable quality reduction
 0.500  barely acceptable quality
 0.300  severely degraded quality
 0.000  completely degraded quality





> [Robert]
>> This also makes it much harder to cache the 
>> responses, as the cache needs to determine whether or not the 
>> representation has changed -- the cache also needs to parse the 
>> headers rather than just comparing URI and content.  
> 
> Don't know caches intimately, but I don't see why that's algorithmically 
> difficult. Just look at the Content-type of the response. Is it harder for 
> caches to examine headers than content or URI? (That's an earnest, perhaps 
> naïve, question.)

See my earlier response.  The problem is without a 'Vary' header or
other cache-control headers, caches may assume that a URL is a fixed
resource.

If it were to assume that was static, then it wouldn't matter what
was sent for the Accept, Accept-Encoding or Accept-Language ... and
so the first request proxied gets cached, and then subsequent
requests get the cached copy, even if that's not what the server
would have sent.


> If we are talking about caching on the client here (not caching proxies), I 
> would think in most cases requests are issued with the same Accept-* headers, 
> so caching will work as expected anyway.

I assume he's talking about caching proxies, where it's a real
problem.


> [Robert]
>> Link headers 
>> can be added with a simple apache configuration rule, and as they're 
>> static are easy to cache. So the server side is easy, and the client side is 
>> trivial.
> 
> Hadn't heard of these. (They are on Wikipedia so they must be real.) What do 
> they offer over HTML  elements populated from the Dublin Core Element 
> Set?

Wikipedia was the first place you looked?  Not IETF or W3C?
No wonder people say libraries are doomed, if even people who work
in libraries go straight to Wikipedia.


...


oh, and I should follow up to my posting from earlier tonight --
upon re-reading the HTTP/1.1 spec, it seems that there *is* a way to
specify the authoritative URL returned without an HTTP round-trip,
Content-Location :

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14

Of course, it doesn't look like my web browser does anything with
it:

http://www.w3.org/Protocols/rfc2616/rfc2616
http://www.w3.org/Protocols/rfc2616/rfc2616.html
http://www.w3.org/Protocols/rfc2616/rfc2616.txt

... so you'd still have to use Location: if you wanted it to 
show up to the general public.

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 3:51 PM, LeVan,Ralph wrote:

> I'm confused about the supposed distinction between content negotiation and 
> explicit content request in a URL.  The reason I'm confused is that the 
> response to content negotiation is supposed to be a content location header 
> with a URL that is guaranteed to return the negotiated content.  In other 
> words, there *must* be a form of the URL that bypasses content negotiation.  
> If you can do content negotiation, then you should have a URL form that 
> doesn't require content negotiation.

There are three types of content negotiation discussed in HTTP/1.1.  The
one that most gets used is 'transparent negotiation' which results in
there being different content served under a single URL.

Transparent negotiation schemes do *not* redirect to a new URL to allow
the cache or browser to identify the specific content returned.  (this
would require an extra round trip, as you'd have to send a Location:
header to redirect, then have the browser request the new page)

So that you don't screw up web proxies, you have to specify the 'Vary'
header to tell which parameters you consider significant so that it
knows what is or isn't cacheable.  So if you might serve different
content based on the Accept and Accept-Encoding would return:

Vary: Accept, Accept-Encoding

(Including 'User Agent' is problematic because of some browsers
that pack in every module + the version in there, making there be so
many permutations that many proxies will refuse to cache it)

-Joe

(who has been managing web servers since HTTP/0.9, and gets 
annoyed when I have to explain to our security folks each year
why I don't reject pre-HTTP/1.1 requests or follow the rest of
the CIS benchmark recommendations that cause our web services to
fail horribly)


Re: [CODE4LIB] calibr: a simple opening hours calendar

2013-11-27 Thread Joe Hourcle
On Nov 27, 2013, at 11:01 AM, Jonathan Rochkind wrote:

> Many of our academic libraries have very byzantine 'hours' policies.
> 
> Developing UI that can express these sensibly is time-consuming and 
> difficult; by doing a great job at it (like Sean has), you can make the 
> byzantine hours logic a lot easier for users to understand... but you can 
> still only do so much to make convoluted complicated library hours easy to 
> deal with and understand for users.
> 
> If libraries can instead simplify their hours, it would make things a heck of 
> a lot easier on our users. Synchronize the hours of the different parts of 
> the library as much as possible. If some service points aren't open the full 
> hours of the library, if you can make all those service points open the 
> _same_ reduced hours, not each be different. Etc.
> 
> To some extent, working on hours displays to convey byzantine hours 
> structures can turn into the familiar case of people looking for 
> technological magic bullet solutions to what are in fact business and social 
> problems.

I agree up to a point.

When I was at GWU, we were running what was the most customized
version of Banner (a software system for class registration, HR,
etc.)  Some of the changes were to deal with rules that no one
could come up with a good reason for, and they should have been
simplified.  Other ones were there for a legitimate reason.*

You should take these sorts of opportunities to ask *why* the
hours are so complicated, and either document the reason for it,
or look to simplify it.

Did a previous librarian have some regularly scheduled thing
every Tuesday afternoon, and that's why one section closes
down early on Tuesdays?  If they're not there anymore, you can
change that.

Does one station requiring some sort of a shutdown / closing
procedure that takes a significant amount of time, and they
close early so they're done by closing time?  Or do they open
late because they have similar issue setting up in the morning,
and it's unrealistic to have them come in earlier than everyone
else?  Maybe there's something else that could be done to
improve and/or speed up the procedures.**

Has there been historically less demand for certain types of
books at different times of the day?  Well, that's going to be
hard to verify, as people have now adjusted to the library's
hours, rather than visa-versa ... but it's a legitimate reason
to not keep service points open if no one's using them.

... but I would suggest that you don't use criteria like the
US Postal Service's recommendation to remove postboxes -- they
based it on number of pieces of mail, and ended up removing
them all in some areas.

...

Anyway, the point I'm making -- libraries are about service.
Simplification might make it easier to keep track of things,
but it doesn't necessarily make for better service.

-Joe

* Well, legitimate to someone, at least.  For instance, the
development office had a definition of "alumni" that included
donors who might not've actually attended the university.

** When I worked for the group that ran GW's computer labs,
some days I staffed a desk that we had over in the library ...
but I had to clock in at the main office, then walk over to
other building, and once the shift was over, walk back to the
main office to clock out.  I got them to designate one of the
phones in the library computer lab as being allowed to call
into the time clock system, so I could stop wasting so much
time ... then they decided to just stop having staff over
there.



> On 11/27/13 9:25 AM, Sean Hannan wrote:
>> I¹d argue that library hours are nothing but edge cases.
>> 
>> Staying open past midnight is actually a common one. But how do you deal
>> with multiple library locations? Multiple service points at multiple
>> library locations? Service points that are Œby appointment only¹ during
>> certain days/weeks/months of the year? Physical service points that are
>> under renovation (and therefore closed) but their service is being carried
>> out from another location?
>> 
>> When you have these edge cases sorted out, how do you display it to users
>> in a way that makes any kind of sense? How do you get beyond shoehorning
>> this massive amount of data into outmoded visual paradigms into something
>> that is easily scanned and processed by users? How do you make this data
>> visualization work on tablets and phones?
>> 
>> The data side of calendaring is one thing (and for as standard and
>> developed as the are, iCal and Google Calendar¹s data formats don¹t get it
>> 100% correct as far as I¹m concerned). Designing the interaction is wholly
>> another.
>> 
>> It took me a good two or three weeks to design the interaction for our new
>> hours page (http://www.library.jhu.edu/hours.html) over the summer. There
>> were lots of iterations, lots of feedback, lots of user testing. ³User
>> testing? Just for an hours page?² Yes. It¹s one of our most highly sought
>> pieces of information on 

Re: [CODE4LIB] Tab delimited file with Python CSV

2013-11-25 Thread Joe Hourcle
On Nov 25, 2013, at 1:05 PM, Jonathan Rochkind wrote:

> Ah, but what if the data itself has tabs!  Doh!
> 
> It can be a mess either way.  There are standards (or conventions?) for 
> escaping internal commas in CSV -- which doesn't mean the software that was 
> used to produce the CSV, or the software you are using to read it, actually 
> respects them.

You don't have to escape the commas, you just have to double-quote the string.  
If you want to have a double quote, you put two in a row:, eg:

"He said, ""hello"""


> But I'm not sure if there are even standards/conventions for escaping tabs in 
> a tab-delimited text file?

None official ones that I'm aware of.  I've seen some parsers that will 
consider a backslash before a delimiter to be an escape, but I don't know if 
there's an official spec for tab- / pipe- / whatever-delimited text.



> Really, the lesson to me is that you should always consider use an existing 
> well-tested library for both reading and writing these files, whether CSV or 
> tab-delimited -- even if you think "Oh, it's so simple, why bother than 
> that."  There will be edge cases. That you will discover only when they cause 
> bugs, possibly after somewhat painful debugging. A well-used third-party 
> library is less likely to have such edge case bugs.

Agreed, but in this case, it might be easier to bypass the library.  (if you 
were using a library, you'd have to shift an empty element to the front of each 
row, then output it).


> I am more ruby than python; in ruby there is a library for reading and 
> writing CSV in the stdlib. 
> http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html

And I'm more perl, and generally lazy for this simple of an edit:

perl -pi -e 's/^/\t/' file_to_convert

(the '-p' tells it to apply the transformation to each line, '-i.bak' tells it 
to save the file with '.bak' appended before processing, "-e 's/^/\t/'" is to 
put a tab at the front of the line)

-Joe


Re: [CODE4LIB] Question for Institutional Repository Folks

2013-10-28 Thread Joe Hourcle
On Oct 28, 2013, at 1:50 PM, Matthew Sherman wrote:

> We use DSpace for our repository so any editing to the PDFs have to be done
> in Acrobat before uploading.  I can add a note to the metadata in DSpace,
> but I am not sure if that fulfills the permissions agreement.  I was
> recently hired for this position so I do not know who provided us the file
> to upload in the first place.  That is why I am asking if anyone else has
> dealt with this since I am unsure if I can ever get the password.

I'm not an IR person, but I know that PDFs are effectively a container.

It *might* be possible to create a PDF that had a page w/ the notice, then
insert the locked PDF.

A quick search on the topic suggest there might be problems ... I don't
know if it the default of a locked document is to not allow it to be 
inserted:

http://forums.adobe.com/thread/874857

(the last suggestion was trying it as an attachment, and no follow up
after that)

-Joe



> 
> On Mon, Oct 28, 2013 at 1:18 PM, Jim DelRosso  wrote:
> 
>> Matt,
>> 
>> Does the software you use generate cover pages that you can edit? Or can
>> you add the note to the metadata page associated with the document?
>> 
>> Jim
>> 
>> *Jim DelRosso, MPA, MSLIS
>> Digital Projects Coordinator*
>> *Hospitality, Labor, and Management Library*
>> Catherwood Library
>> ILR School
>> Cornell University
>> 239D Ives Hall
>> Ithaca, NY 14853
>> p 607.255.8688
>> f 607.255.9641
>> e jd...@cornell.edu
>> www.ilr.cornell.edu
>> *Advancing the World of Work*
>> 
>> 
>> On Mon, Oct 28, 2013 at 1:13 PM, Matthew Sherman
>> wrote:
>> 
>>> Hello Code4libbers,
>>> 
>>> I had a question for for others who work with institutional repositories.
>>> I have a file given by the a professor that I have permission to post if
>> I
>>> add a note to the PDF, but the file is password locked.  Has anyone else
>>> run into this problem before?  Can anyone give me some advice in how I
>> can
>>> edit this to add the required note to the top of the PDF?  Any advice is
>>> welcome.
>>> 
>>> Matt Sherman
>>> 
>>> 
>>> 
>> 


Re: [CODE4LIB] Faculty publication database

2013-10-25 Thread Joe Hourcle
On Oct 25, 2013, at 11:35 AM, Alevtina Verbovetskaya wrote:

> Hi guys,
> 
> Does your library maintain a database of faculty publications? How do you do 
> it?
> 
> Some things I've come across in my (admittedly brief) research:
> - RSS feeds from the major databases
> - RefWorks citation lists
> 
> These options do not necessarily work for my university, made up of 24 
> colleges/institutions, 6,700+ FT faculty, and 270,000+ degree-seeking 
> students.
> 
> Does anyone have a better solution? It need not be searchable: we are just 
> interested in pulling a periodical report of articles written by our 
> faculty/students without relying on them self-reporting 
> days/weeks/months/years after the fact.

If you're forced to rely on self-reporting, one of the solutions
that I've seen is to add a few more features and introduce it as a
'CV Builder' or some sort of 'Faculty Directory' ... so the faculty
members get some benefit back out of it, and it's more public so they
have an interest in keeping it updated.

I'd also recommend talking to the individual colleges -- it's possible
that some of them already maintain databases, either for the whole
college or at the departmental level.  They might be willing to keep
the data populated if you provide the hosted service.

(and the tenure-track folks have a vested interest in making sure
their records kept up-to-date).

In looking through the other recommendations -- I didn't see ORCID or
ResearcherID mentioned ... I know they're not exhaustive, but it might
be possible to have a way to automate dumps from them -- so the faculty
member keeps ORCID up-to-date, and you periodically generate dumps from
ORCID for all of your faculty.  The last time I checked it, ORCID 
found all of my ASIS&T work ... but missed all of the stuff that I've
published in space physics and data informatics.  (admittedly, those
weren't peer-reviewed, but neither were most of the ASIS&T ones)

-Joe


[CODE4LIB] Please use HTTP 503 (was: Library of Congress)

2013-10-01 Thread Joe Hourcle
On Oct 1, 2013, at 9:52 AM, Nick Ruest wrote:

> Welp. XSDs are redirecting. See[1].
> 
> -nruest
> 
> [1] http://www.loc.gov/standards/mods/v3/mods-3-4.xsd

(*@#!@#%

I tried telling people around here to use HTTP 503 ... but GSA sent out 
advice to use 302s ... 

If there are any people who are still in the process of their 'orderly
shutdown' ... please send HTTP 503 (Service Unavailable) for requests,
so that search engines don't completely screw things up while we're
shut down, or ignorant systems try to process the error page as if it
were real content.

-Joe


Apache : http://stackoverflow.com/q/622466/143791
ISS : http://serverfault.com/q/483145/14119
Nginx : http://stackoverflow.com/q/5984270/143791



> On 13-10-01 09:36 AM, John Palmer wrote:
>> Furloughs don't officially start until noon local time Tuesday, so they may
>> be in the process of receiving instructions for shutdown.
>> 
>> 
>> On Tue, Oct 1, 2013 at 6:21 AM, Doran, Michael D  wrote:
>> 
 As far as I can tell the LOC is up and the offices are closed.
>>> HORRAY!!
 Let's celebrate!
>>> 
>>> Before we start celebrating, let's consider our friends and colleagues at
>>> the LOC (some of who are code4lib people) who aren't able to work and
>>> aren't getting paid starting today.
>>> 
>>> -- Michael
>>> 
>>> # Michael Doran, Systems Librarian
>>> # University of Texas at Arlington
>>> # 817-272-5326 office
>>> # 817-688-1926 mobile
>>> # do...@uta.edu
>>> # http://rocky.uta.edu/doran/
>>> 
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Riley Childs
 Sent: Tuesday, October 01, 2013 5:28 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] Library of Congress
 
 As far as I can tell the LOC is up and the offices are closed.
>>> HORRAY!!
 Let's celebrate!
 
 Riley Childs
 Junior and Library Tech Manager
 Charlotte United Christian Academy
 +1 (704) 497-2086
 Sent from my iPhone
 Please excuse mistakes
>>> 


[CODE4LIB] Government shutdown (was: Library of Congress)

2013-10-01 Thread Joe Hourcle
On Oct 1, 2013, at 7:54 AM, Kimberly Silk wrote:

> Even a list of what's up and down would be helpful, if anyone is inclined. 
> 
> Sent from Kim's iPhone
> 
>> On Oct 1, 2013, at 7:19 AM, "BWS Johnson"  wrote:
>> 
>> Salvete!
>> 
>>> As far as I can tell the LOC is up and the offices are closed. HORRAY!! 
>>> Let's celebrate!
>> 
>> 
>>Yeah, I guess the website folks haven't yet got the memo.
>> 
>> http://www.loc.gov/today/pr/2013/13-A06.html
>> 
>>I suppose someone that's bored on this list might generate a who's up and 
>> who's down app for the GOVDOCs folks. :)

My understanding is that some sort of a list will be posted at :

http://notice.usa.gov/


Anything that's funded directly (eg, the health insurance stuff that people are 
up in arms about) will stay open, as will anything considered necessary for 
human safety & protecting property.

... but we're only allowed to do the minimum amount necessary.  So I have keep 
space weather data flowing, but it's only to patch up things if they break.* 

-Joe

* But there's the question of how I'd know when they break ... as I'm not 
supposed to monitor ... the contingency plans from 2 years ago were that one 
person on our team would check the logs daily, and if something went wrong, 
call in someone else.  (as they weren't allowed to call themselves into work) 


Re: [CODE4LIB] Way to record usage of tables/rooms/chairs in Library

2013-08-16 Thread Joe Hourcle
On Aug 16, 2013, at 9:52 AM, Ian Walls wrote:

> Suma is the most practical and reliable way to do this right now, I think.
> 
> I've been investigating using a sensor network, but there are a lot of
> limits on the accuracy of PIR, and trip-lasers are low enough and require
> enough power that they'd be troublesome to maintain in a busy undergraduate
> environment.
> 
> One idea was to use an array of sensors:  PIR for motion, microphone for
> noise level and piezo/something similar for vibration.  The thought is that
> elevated levels of these 3 measurements should correspond to "high
> activity".  The placement and calibration of the sensors, though, would be
> key, and you'd need to do some thorough spot checking with Suma or something
> similar in order to be confident that what you're measuring (motion, noise
> and vibration) actually correlate to number of people.
> 
> The sensors would also need to be made out of cheap enough materials and use
> low-congestion wireless frequencies in order to be practical.  Balancing
> this with accuracy may never happen... but it would certainly be a fun
> experiment!


If you're going to take the sensor approach, and it's just a matter of
if there are bodies in specific places, you *might* be able to do it by
modifying cheap webcams.

Many are sensitive in infrared, so you take the IR filter out, and then
add a visible filter.

Position the cameras so that you have coverage of the area you care about,
have them take a picture at whatever times you care about, and then it's
just looking for hot spots.

(although of course, if you do this, it'd be just as easy for someone
to review security camera footage, if you have coverage in the places
you care about; the IR might be easier to automate the counting, though,
if you have someone who's good with automated image analysis)

And if it's just a matter of activity counting -- you might be able
to see if your wireless access points can tell how many items they're
in contact with, and use that as a proxy.

-Joe


Re: [CODE4LIB] locking app for iPads

2013-07-25 Thread Joe Hourcle
On Jul 25, 2013, at 3:52 PM, Cheryl Kohen wrote:

> Dear Fellow Techs,
> 
> We're looking to create a circulation policy for iPads (gen 4) in the
> Learning Commons, and were wondering about an app that will lock the device
> after a specific amount of time (3-4 hours).  The idea is if a student
> does, in fact, steal the device, they will be locked out of actually
> utilizing it.  Has anyone heard of something like this?

I don't know of a time-sensitive one, but Apple's "Find My iPad" (or iPhone), 
has an option to remotely lock a device:

https://www.apple.com/icloud/features/find-my-iphone.html

I suspect it needs a network connection to send the signal to lock.

I don't know if it'll stop anyone who can jailbreak the device, but it would 
hopefully stop the person attempting to 'borrow' it long-term.  (and you can 
track where it is, if it's a device with GPS)

-Joe


Re: [CODE4LIB] StackExchange reboot?

2013-07-08 Thread Joe Hourcle
On Jul 8, 2013, at 3:50 PM, Christie Peterson wrote:

> I agree with both Shaun and Galen's points; when you're asking a "how to do X 
> with tool Y" type of question, SE is a great forum. Like Christina, I've 
> mostly encountered SE when Googling for answers to these types of questions.
> 
> However, for the reasons that Henry and Gary mentioned, I was disappointed in 
> the Digital Preservation SE experience. At the request of one of the SE 
> organizers, I posted a question there that I had also posted to a listserv. 
> It was flagged for not being in the proper form, but I have no idea how I 
> could have framed it properly for SE because it simply wasn't a question that 
> had a single answer. I wanted discussion. Digital Preservation in particular 
> is a developing field and I was trying to gague opinions and currently 
> evolving best practices. Somewhat ironically given the potential value of the 
> commenting and upvoting mechanism, SE did not prove to be a good forum for 
> this.
> 
> There may be some value to having a code4lib SE instance that answers 
> questions of the "how to do X with tool Y" type and similar for the reasons 
> that Shaun and Galen state. But unless the community standards about what 
> makes a "good" SE question change radically, I don't see it being an 
> attractive or useful forum for the more open-ended, discussion/opinion type 
> questions that people often post to library, digital preservation and other 
> listservs.


I actually just responded to this issue the other day on the Open Data SE site:

http://meta.opendata.stackexchange.com/q/126/263

Back when Cooking SE started (~2.5 years ago), multiple possible answers was 
considered a valid question.  They didn't tend to like polls ('what's the best 
...') but questions about possibilities of how to deal with problems were 
acceptable.  I'd link to some of them, but there have since been a few people 
who go around and vote to close every question they don't like, even if they're 
gotten a dozen or more upvotes.

Here's one instead that's not even a question that's ranked in the top 10 
'questions' on the cooking site:

http://cooking.stackexchange.com/q/784/67

Personally, I'm of the opinion that there are *very* few problems that only 
have a single solution, or a 'best' solution.  What they really tend to reward 
people for is coming up with a plausible, moderately detailed answer quick 
enough.  I've seen a number get marked as the 'best answer' within 30 min of 
the question being asked where the answer from my point of view was just plain 
wrong.

I do see a use for the sort of things that might've once been considered 
'community wiki' ... what books can I recommend to a 3rd grader who is 
interested in science fiction?  (I've cheated before and worded them like 
'where can I find a list of books to recommend ...')

It *might* be possible to get enough like-minded people involved to ensure that 
if anyone attempts to close reasonable questions we can get them re-opened 
quickly ... but I'd like to recommend changing the scope up front to museums, 
libraries & archives.  I don't know that the more practical 'library' and the 
abstract/academic 'library science' communities really mesh all that well.

And I should probably go get some sleep as I write e-mail that's even more 
incoherent than typical when I've only gotten ~8hrs sleep over the last 3 days.

-Joe


Re: [CODE4LIB] Lightweight Autocomplete Application

2013-07-08 Thread Joe Hourcle
On Jul 8, 2013, at 10:37 AM, Anderson, David (NIH/NLM) [E] wrote:

> I'm looking for a lightweight autocomplete application for data entry. Here's 
> what I'd like to be able to do:
> 
> 
> * Import large controlled vocabularies into the app
> 
> * Call up the app with a macro wherever I'm entering data
> 
> * Begin typing in a term from the vocabulary, get a list of 
> suggestions for terms
> 
> * Select a term from the list and have it paste automatically into my 
> data entry field
> 
> Ideally it would load and suggest terms quickly. I've looked around, but 
> nothing really stands out. Anyone using anything like this?


Is this web-based?

If not, do you have control of the software that you're entering the data into?

If so, what language is it in?)

If not, what OS are you using?


-Joe


Re: [CODE4LIB] Code4Lib 2014: Save the dates!

2013-06-29 Thread Joe Hourcle
On Jun 29, 2013, at 7:16 AM, BWS Johnson wrote:

> Salvete!
> 
> 
>> I am happy to announce that we have secured the venue and dates for
>> Code4Lib 2014!  The conference will be held at the Sheraton Raleigh Hotel
>> in downtown Raleigh, NC on March 24 - 27, 2014.  Preconferences will be
>> held Monday March 24, and the main conference on Tuesday March 25 - 27.
>> 
> 
>  Hooray, that's sort of close. Maybe I'll be able to pit fight my own 
> place next year.
> 
> 
>> Finally, the hotel has the capacity to host all of the attendees, and we've
>> negotiated a rate of $159/night that includes wireless access in the hotel
>> rooms.  Hotel reservations will be able to made after you register using
>> the information provided in your registration confirmation.  We will be
>> publishing more details as become available.
> 
> 
>  Ruh oh. This was rather shocking. Perhaps you might wish to show them a 
> hotels.com search, which puts your $159 just over the Hilton and about double 
> other places in the vicinity. I'm sure it's nice and all that, but uh, 
> perhaps they would be willing to come down seeing as how we're sending a 
> boatload of traffic their way.

Government per-diem rates for Raleigh is $91 per night :

http://www.gsa.gov/portal/category/100120

I have no idea if that can be used for negotiations at all.

For some reason, they're not showing any federal government rates when I 
searched, but they're offering state government employees rooms at $64/night.  
(you might have to pay extra for the wifi, though)  I highly suggest that 
people who work for public universities or libraries inquire about getting that 
rate.



-Joe

(even though the state of maryland makes me pay into the state retirement 
system because I'm a municipal elected official, they won't issue me a state ID 
card, so I can't get the state rates when traveling, which are typically better 
than the federal rates ... I've actually debated about if it makes sense to 
work for 3 years in a real state job, then claim a pension based on my 'top 3 
years' of pay times the number of years worked)


Re: [CODE4LIB] DOI scraping

2013-05-21 Thread Joe Hourcle
On May 21, 2013, at 9:40 PM, Fitchett, Deborah wrote:

> Joe and Owen--
> 
> Thanks for the ideas!
> 
> It's a bit of the opposite goal to LibX, in that rather than having a 
> title/DOI/whatever from some random site and wanting to get to  the full-text 
> article, I'm looking at the use case of academics who are already viewing the 
> full-text article and want a link that they can share with students.  Even 
> aside from the proxy prefix, the url in their browser may include (or consist 
> entirely of) session gunk.
> 
> I'll try a regexp and see how far that gets me. I'm a bit trepidatious about 
> the way the DOI standard allows just about any character imaginable, but at 
> least there's the 10. prefix. Am also considering that if DOIs also appear in 
> the article's bibliography I'll need to make sure the javascript can 
> distinguish between them and the DOI for the article itself; but a lot of 
> this might be 'cross that bridge if I come to it' stuff.


Crap.  I just remembered :

http://shortdoi.org/

... I don't know if any publishers are actually using them, or if they're just 
for people to use on twitter & other social media.

The real problem with them is that they don't have the '10.' string in them.

You can probably get away with just tracking the resolving form of them:

http://doi[.]org/(\w+)

And ignore the 

10/(\w+)

form.

-Joe


Re: [CODE4LIB] Policies for 3D Printers

2013-05-20 Thread Joe Hourcle
On May 20, 2013, at 4:47 PM, Bigwood, David wrote:

> That's a question every library will have to answer for themselves. 
> 
> For us it makes perfect sense. Our scientists are sending out files to
> have 3D models of craters. When the price drops enough it will become
> more cost effective to do that in-house. It will just be an extension of
> maps and remote sensing data we already have in the collection. I can
> see a limit being fabrication related to the mission of the Institute,
> same as the large-format printer.
> 
> A public library might have other concerns. If it is unlimited and free,
> is printing out 100 Hulk statues to sell at a comic convention
> acceptable? How about Barbie dolls to sell at a flea market? Or maybe
> Barbee dolls to side-step trademarks? Lots of unanswered questions, but
> each library will have to decide based on local conditions.

Actually, this made me think back to my undergrad, when I worked
in our schools 'Academic Computing' department.  We had a big problem
with students printing out multiple copies of their thesis on the
printers in the computer labs, because they'd:

1. tie up the printers for a rather long time.
2. burn through all of the paper

The result was, one or two bad actors kept everyone else from being
able to use the services, because there were taking advantage of our
'free' printing.

Our typical process, when we found someone needed to print their
thesis was to print one copy from the printer in our staff offices,
and they then had to go to one of the local copy shops to make the
additional copies that they needed.  (the policy of only one copy
had been established for years, but was only really enforced when
people came in and complained about people printing whole books)


Although I can appreciate some of the arguments for making library
services free, there needs to be some sort of a line drawn so that
one or two people don't end up monopolizing a service.

Just as I left, they ended up going to a system of some number of
free pages per semester per student, with them having to pay if
they wanted to print more than their gratis quota.  I don't know
if something like that would work, but you'd have to work out how
to handle it.  (number of objects?  time spent on the printer?
amount of material used?)

-Joe


Re: [CODE4LIB] On-going support for DL projects

2013-05-17 Thread Joe Hourcle
On May 17, 2013, at 9:51 AM, Tim McGeary wrote:

> I'm interested in starting or joining discussions about best practices for
> on-going support for digital library projects.  In particular, I'm looking
> at non-repository projects, such as projects built on applications like
> Omeka.  In the repository context, there are initiatives like APTrust and
> DPN that are addressing on-going and long term collaborative support.  But,
> as far as I know, we aren't having the same types of discussions for DL
> projects that are application driven.

If you're asking about funding issues, most of those discussions that I've
seen lump it into 'governance'.


> There is no easy answer for this, so I'm looking for discussion.
> 
>   - Should we begin considering a cooperative project that focuses on
>   emulation, where we could archive projects that emulate the system
>   environment they were built?

I know that there are projects using emulation when it'd be too expensive
to port the software (and validate / vet it).  There are some that are
are setting up VMs for new software being written, so that they can
archive the whole environment to ensure that the proper version of the 
OS, libraries, etc. are captured.

Most of the ones that I've been have been focusing on scientific
workflows, but that's likely because that's the field I'm in, so I tend
to see more of those talks at conferences than other subjects.


>   - Do we set policy that these types of projects last for as long as they
>   can, and once they break they are pulled down?

I wouldn't recommend that directly ... like anything, the stuff being
archived has a value, and if someone's willing to pay for it to be
continued, then you do it.  Maybe you just need to have a policy on
cost-recovery for when this happens.  (and then you need to look at
the various 'governance' discussions.

>   - Do we set policy that supports these projects for a certain period of
>   time and then deliver the application, files, and databases to the faculty
>   member to find their own support?

The ultimate decision might be at a higher pay grade -- you may want
to come up with the list of options, estimated costs, and have the
provost or deans decide what makes sense for the budget.

>   - Do we look for a solution like the Way Back Machine of the Internet
>   Archive to try to present some static / flat presentation of these project?

Again, it likely depends on what's being archived.  An online database
that you can search / filter / interact with would be mostly useless
as static pages.

-Joe


Re: [CODE4LIB] DOI scraping

2013-05-17 Thread Joe Hourcle
On May 17, 2013, at 12:32 AM, Fitchett, Deborah wrote:

> Kia ora koutou,
> 
> I’m wanting to create a bookmarklet that will let people on a journal article 
> webpage just click the bookmarklet and get a permalink to that article, 
> including our proxy information so it can be accessed off-campus.
> 
> Once I’ve got a DOI (or other permalink, but I’ll cross that bridge later), 
> the rest is easy. The trouble is getting the DOI. The options seem to be:


> Can anyone think of anything else I should be looking at for inspiration?

4. Look for any strings that look like a DOI:

\b((?:http://dx.doi.org/|doi:|)10.[\d.]+/(?:\S+))

(as it sucks to code special things for each database, in case they change or 
you add a new one)

You can then fall back to #1 if necessary.


> Also on a more general matter: I have the general level of Javascript that 
> one gets by poking at things and doing small projects and then getting 
> distracted by other things and then coming back some months later for a 
> different small project and having to relearn it all over again. I’ve long 
> had jQuery on my “I guess I’m going to have to learn this someday but, um, 
> today I just wanna stick with what I know” list. So is this the kind of thing 
> where it’s going to be quicker to learn something about jQuery before I get 
> started, or can I just as easily muddle along with my existing limited 
> Javascript? (What really are the pros and cons here?)

If depends on what you're going to do with the output -- I'd likely look 
through the  values for http://dx.doi.org DOIs first, then just look 
at the text displaying on the page.  I don't think you'd need jQuery for that.

-Joe


Re: [CODE4LIB] makerspaces in libraries workshp

2013-05-15 Thread Joe Hourcle
On May 15, 2013, at 8:30 AM, Edward Iglesias wrote:

> Hello All,
> 
> I have the unlikely distinction of getting to offer a 1 day workshop on
> Makerspaces in libraries.  I have a general idea of how it's going to go
> --morning theory afternoon hands on -- but am a little overwhelmed by the
> possibilities.  My first thought was to show them how to use a Raspberry Pi
> but that would require them all to buy a Raspberry Pi.  I am open to
> suggestions on what would be worth learning that is hands on and preferably
> cheap for a group of around 20.  What would you teach/learn in an afternoon
> given the chance?
> 
> Edward Iglesias


I'd make sure to mention that this does *not* have to be high-tech.

Our library runs jewelry-making workshops, and some of the local
churches have knitting circles / quilting bees so there can be a 
social component of 'making'.  They've never considered this to be
'makerspaces', but it fits the description.

If it were me, depending on how much time you had, I'd try to come
up with some sort of a project that people could build & take home
with them,  (and so the Raspberry Pi idea is likely out). Depending
on where you are, it might be a good time of year to make bird or
bat houses, or maybe something decorative.

Have them leave with a physical item that they can take and show
off to others.

Depending on how soon you'll get kicked out after your class ends,
you might be able to plan for building something, and then let
people stay later if they wanted to paint or otherwise decorate
it.

I'd plan on having someone cut all of the pieces in advance unless
it can be done w/ hand tools and you have a sufficient number of
the necessary tools ... ideally, you'd want something that could
be assembled with press-fit and glue, or maybe a few nails or screws.
(if you had to add hinges).

-Joe


If you really need an idea of something to make -- I can give you
plans for gift boxes that I make ... it's shadow-box that says
'in case of emergency, break glass', and you can then put whatever
you want in them.  (typically, I give 'em with pacifiers to 
friends having their first child ... but I've done other stuff,
like gave one w/ a box of kosher salt, peppercorns and whole
cumin to Alton Brown when he was doing a book signing back in
2004 or so)

It's simple pine, a plexiglass front, etc.  You'll need a table
saw, a miter box or chop saw and a label maker, and then it's just
a matter of glue, a few nails, and some sanding.

(you could also borrow a pneumatic brad nailer + a power sander,
so that once you get everyone to make the item, show that it 
can all be done in 1/10th the time w/ the proper tools ... which
is part of the reason for building out these spaces)


[CODE4LIB] FW: Digital Forensics Hackathon - June 3-5

2013-05-01 Thread Joe Hourcle
I thought this was something that might interest people in code4lib.

-Joe


> -Original Message-
> From: Cal Lee [mailto:cal...@email.unc.edu] 
> Sent: Wednesday, May 01, 2013 11:36 AM
> Subject: Digital Forensics Hackathon - June 3-5
> 
> We'll be running a hackathon in Chapel Hill on June 3-5 that will focus on 
> applying digital forensics methods to born-digital collections. 
> We're running this with the Open Planets Foundation, who have done a terrific 
> job in the past of running these events.
> 
> The format is one in which people bring real technical challenges (including 
> the associated data from their collections) to the event and pair up with 
> developers who can provide substantive solutions to those challenges by the 
> end of the three days.
> 
> http://wiki.opf-labs.org/display/KB/2013-06-03+OPF+Hackathon+-+Tackling+Real-World+Collection+Challenges+with+Digital+Forensics+Tools+and+Methods+%28Chapel+Hill%29
>  
> 
> 
> I'm very excited that we're running this event in Chapel Hill.  It's the 
> first time that this very successful OPF model has made it to the US. 
> It should be a great opportunity for all involved.
> 
> I would really appreciate any efforts you could take to help us with getting 
> the word out about it.  Broadcasts through mailing lists, Twitter and such 
> are all helpful.  Even better is pointing it out to specific individuals who 
> you think would be interested and would benefit from the event.
> 
> The deadline for booking a hotel room at the block rate is May 19.  But it's 
> even better if people sign up well before then, so we can make the 
> appropriate pairings and plan for the event.
> 
> - Cal


Re: [CODE4LIB] Tool to highlight differences in two files

2013-04-23 Thread Joe Hourcle
On Apr 23, 2013, at 8:12 PM, Genny Engel wrote:

> There's a list here that may be more along the lines of what you're seeking.
> 
> http://webapps.stackexchange.com/questions/11547/diff-for-websites


Hmm ... I guess I should actually accept the answer as it was the only one ever 
given.

-Joe


Re: [CODE4LIB] Tool to highlight differences in two files

2013-04-23 Thread Joe Hourcle
On Apr 23, 2013, at 4:37 PM, Alexander Duryee wrote:

> The absolute simplest way to do this would be to fire up a terminal
> (OSX/Linux) and:
> 
> diff page1.html page2.html | less
> 
> Unfortunately, this will also catch changes made in other markup, and
> may or may not be terribly readable.

At the very least, I'd suggest adding a '-b' which will ignore changes to 
whitespace.

Also see:

http://www.w3.org/wiki/HtmlDiff

-Joe


> On Tue, Apr 23, 2013 at 4:31 PM, Alevtina Verbovetskaya
>  wrote:
>> I've recently begun to use Beyond Compare: http://www.scootersoftware.com/ 
>> It's not free or OSS, though.
>> 
>> There's also a plugin for Notepad++ that does something similar: 
>> http://sourceforge.net/projects/npp-compare/ This is free, of course.
>> 
>> Thanks!
>> Allie
>> 
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
>> Wilhelmina Randtke
>> Sent: Tuesday, April 23, 2013 4:24 PM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] Tool to highlight differences in two files
>> 
>> I would like to compare versions of a website scraped at different times to 
>> see what paragraphs on a page have changed.  Does anyone here know of a tool 
>> for holding two files side by side and noting what is the same and what is 
>> different between the files?
>> 
>> It seems like any simple script to note differences in two strings of text 
>> would work, but I don't know a tool to use.
>> 
>> -Wilhelmina Randtke


[CODE4LIB] password lockboxes (was: what do you do: API accounts used by library software, that assume an individual is registered)

2013-03-05 Thread Joe Hourcle
On Mar 5, 2013, at 8:29 AM, Adam Constabaris wrote:

> An option is to use a password management program (KeepassX is good because
> it is cross platform) to store the passwords on the shared drive, although
> of course you need to distribute the passphrase for it around.

So years ago, when I worked for a university, they wanted us to put all of the 
root passwords into an envelope, and give them to management to hold.  (we were 
a Solaris shop, so there actually were root passwords on the boxes, but you had 
to connect from the console or su to be able to use 'em).

We managed to drag our heels on it, and management forgot about it*, but I had 
an idea ...

What if there were a way to store the passwords similar to the secret formula 
in Knight Rider?

Yes, I know, it's an obscure geeky reference, and probably dates me.  The story 
went that the secret bullet-proof spray on coating wasn't held by any one 
person; there were three people who each knew part of the formula, and that any 
two of them had enough knowledge to make it.

For needing 2 of 3 people, the process is simple -- divide it up into 3 parts, 
and each person has a different missing bit.  This doesn't work for 4 people, 
though (either needing 2 people, or 3 people to complete it).

You could probably do it for two or three classes of people (eg, you need 1 
sysadmin + 1 manager to unlock it), but I'm not sure if there's some method to 
get an arbitrary "X of Y" people required to unlock.

If anyone has ideas, send 'em to be off-list.  (If other people want the 
answer, I can aggregate / summarize the results, so I don't end up starting yet 
another inappropriate out-of-control thread)

...

Oh, and I was assuming that you'd be using PGP, using the public key to encrypt 
the passwords, so that anyone could insert / update a password into whatever 
drop box you had; it'd only be taking stuff out that would require multiple 
people to combine efforts.

-Joe


* or at least, they didn't bring it up again while I was still employed there.


Re: [CODE4LIB] what do you do: API accounts used by library software, that assume an individual is registered

2013-03-04 Thread Joe Hourcle
On Mar 4, 2013, at 11:11 AM, Jonathan Rochkind  wrote:

> Whether it's Amazon AWS, or Yahoo BOSS, or JournalTOCs, or almost anything 
> else -- there are a variety of API's that library software wants to use, 
> which require registering an account to use.

[trimmed]

> Has anyone found a way to deal with this issue, other than having each API 
> registered to an account belonging to whatever individual staff happened to 
> be dealing with it that day?

The government actually has a program for this.


http://www.howto.gov/web-content/resources/tools/terms-of-service-agreements

If you work for the feds there are some alternate terms of services for various 
"Social Media Providers" (it actually covers more than what I think of as 
"social media").  So far, they've only really looked at 'free' services.

It's a little bit tricky to use them, as you have to find out if your 
government agency has yet agreed to the terms that a company is offering.  If 
they don't have an agreement ... well, it takes some time to get the approval, 
as it's got to go through the agency's legal council.

If you're with a state government (and most state universities are considered 
state government), then there are alternate TOSes available for Twitter, 
Facebook and YouTube.

-Joe


Re: [CODE4LIB] A newbie seeking input/suggestions

2013-02-21 Thread Joe Hourcle
On Feb 21, 2013, at 2:28 PM, Cab Vinton wrote:

> This seems like a good application for text messaging -- as long as
> all librarians have smartphones, which they surely would at Yale :-)


The problem is that you'd have to have it dynamically generate the list of who 
to text based on who's currently on duty.

Otherwise, you have it harassing people on their days off, when they're home 
sick, etc.

-Joe


Re: [CODE4LIB] A newbie seeking input/suggestions

2013-02-21 Thread Joe Hourcle
On Feb 21, 2013, at 11:20 AM, Paul Butler (pbutler3) wrote:

> For something like this I would go the hardware route.  A walkie-talkie on a 
> charging stand at each service point. The walkie-talkies would always be on 
> and tuned to the same channel. That way the staff person is not tied to the 
> PC itself, they can grab the walkie-talkie and still do what they need to do 
> - like head to the stacks or look for that reserve material. No phone number 
> to remember. This solution could help with other issues, like security and 
> system/network outages. 

I admit, I've never worked as a librarian, but I did work at a computer help 
desk during undergrad.

We had a policy of trying our best *not* to go into the computer labs, because 
if you did, you'd get 6+ people who suddenly had questions they wanted to ask 
... but couldn't have been bothered to actually go to the office to ask.  When 
I first started, someone who went to go add paper to a printer might not come 
back for 30+ minutes.

(I realize that this policy likely won't work for a library, though)

Our follow-up policy was not the answer questions in the labs, and make them go 
to the office so they don't cut in line if there were people queued up.

... so I completely agree about needing something that's not fixed to a single 
location.  If you can make it beep on demand, that's even better.  ("oops, 
sorry, I've got to go, I've been summoned back to the desk")

If you're going to do something that's computer-based, I'd be inclined to think 
about some sort of phone app, or even part of a more comprehensive tool to 
assist in other things that you might need while you're in the stacks trying to 
help someone.

-Joe


Re: [CODE4LIB] You *are* a coder. So what am I?

2013-02-18 Thread Joe Hourcle
On Feb 18, 2013, at 11:17 AM, John Fereira wrote:

> I suggested PHP primarily because I find it easy to read and understand and 
> that's it's very commonly used.  Both Drupal and Wordpress are written in PHP 
> and if we're talking about building web pages there are a lot of sites that 
> use one of those as a CMS.

And if you're forced to maintain one of those, then by all means, learn PHP ... 
but please don't recommend that anyone learn it as a first language.

... and I'd like to say that in my mention of Perl, it was only because there's 
going to be the workshop ... not that I'd necessarily recommend it as a first 
language for all people ... I'd look at what they were interested in trying to 
do, and make a recommendation on what would best help them do what they're 
interested in.



> I've looked at both good and bad perl code, some written some very 
> accomplished software developers, and I still don't like it.   I am not 
> personally interested in learning to make web pages (I've been making them 
> for 20 years) and have mostly dabbled in Ruby but suspect that I'll be doing 
> a lot more programming in Ruby (and will be attending the LibDevConX workshop 
> at Stanford next month where I'm sure we'll be discussing Hydra).   I'm also 
> somewhat familiar with Python but I just haven't found that many people are 
> using it in my institution (where I've worked for the past 15 years) to spend 
> any time learning more about it.  If you're going to suggest mainstream 
> languages I'm not sure how you can omit Java (though just mentioning the word 
> seems to scare people).

It's *really* easy to omit Java:

http://www.recursivity.com/blog/2012/10/28/ides-are-a-language-smell/

... not to mention all of the security vulnerabilities and memory headaches 
associated with anything that runs in a VM.

You might as well ask why I didn't suggest C or assembler for beginners.  
That's not to say that I haven't learned things from programming in those 
languages (and I've even applied tricks from Fortran and IDL in other 
languages), but I wouldn't recommend any of those languages to someone who's 
just learning to program.

-Joe

(ps. I'm grumpier than usual today, as I've been trying to get hpn patched 
openssh to compile under centos 6 ... so that it can be called by a java daemon 
that is called by another C program that dynamically generates python and shell 
scripts ... and executes them but doesn't always check the exit status ... this 
is one of those times when I wish some people hadn't learned to program, so 
they'd just hire someone else to write it)


Re: [CODE4LIB] You *are* a coder. So what am I?

2013-02-17 Thread Joe Hourcle
On Feb 17, 2013, at 11:43 AM, John Fereira wrote:

> I have been writing software "professionally" since around 1980 and first 
> encounterd perl in the early 1990s of so and have *always* disliked it.   
> Last year I had to work on a project that was mostly developed in perl and it 
> reminded me how much I disliked it.  As a utility language, and one that I 
> think is good for beginning programmers (especially for those working in a 
> library) I'd recommend PHP over perl every time.  

I'll agree that there are a few aspects of Perl that can be confusing, as some 
functions will change behavior depending on context, and there was a lot of bad 
code examples out there.* 

... but I'd recommend almost any current mainstream language before 
recommending that someone learn PHP.

If you're looking to make web pages, learn Ruby.

If you're doing data cleanup, Perl if it's lots of text, Python if it's mostly 
numbers.

I should also mention that in the early 1990s would have been Perl 4 ... and 
unfortunately, most people who learned Perl never learned Perl 5.  It's changed 
a lot over the years.  (just like PHP isn't nearly as insecure as it used to be 
... and actually supports placeholders so you don't end up with SQL injections)

-Joe


Re: [CODE4LIB] editing code4lib livestream - preferred format

2013-02-15 Thread Joe Hourcle
On Feb 15, 2013, at 2:30 PM, Matthew Sherman wrote:

> Not to be snarky, but wouldn't the session on HTML5 video tell you what you
> need to know?

Code it in 3+ different formats, and stack your tags in hope that
you've used enough different codecs that the browser actually
supports one of them?

http://caniuse.com/#feat=video,ogv,webm,mpeg4

... then fail back to syncronized slide show / audio:

http://caniuse.com/#feat=audio,svg-smil

... then fail back to Flash or some other security risk.


(or did they have some other solution?)

-Joe



> 
> On Fri, Feb 15, 2013 at 1:20 PM, Tara Robertson 
> wrote:
> 
>> Hi,
>> 
>> I'm editing the video from code4lib into the sesison chunks.
>> 
>> What format should I export the videos as? Anything else I should be aware
>> of?
>> 
>> Thanks,
>> Tara
>> --
>> 
>> Tara Robertson
>> 
>> Accessibility Librarian, CILS 
>> 
>>> 
>> T  604.323.5254
>> F  604.323.5954
>> trobert...@langara.bc.ca > 3ctrobert...@langara.bc.ca
>> %3E>
>> 
>> Langara. 
>> 
>> 100 West 49th Avenue, Vancouver, BC, V5Y 2Z6
>> 


Re: [CODE4LIB] You *are* a coder. So what am I?

2013-02-15 Thread Joe Hourcle
On Feb 15, 2013, at 12:27 PM, Kyle Banerjee wrote:
> On Fri, Feb 15, 2013 at 6:45 AM, Diane Hillmann 
> wrote:
> 
>> I'm all for people learning to code if they want to and think it will help
>> them. But it isn't
>> the only thing library people need to know, and in fact, the other
>> key skill needed is far rarer: knowledge of library data...
>> 
>> ...More useful, I think, is for each side of that skills divide to value
>> the skills
>> of other other, and learn to work together
> 
> 
> Well put. No amount of technical skill substitutes for understanding what
> people are actually doing -- it's very easy to write apps that nail any set
> of specifications and then some but are still totally useless.
> 
> Even if you never intend to do any programming, it's still useful to know
> how to code because it will help you know what is feasible, what questions
> to ask, what is feasible, and how to interpret responses.
> 
> That doesn't mean you need to know any particular language. It does mean
> you need to grok the fundamental methodologies and constraints.

And the vocabulary (which Alison also mentioned, but for those who read
Stranger in a Strange Land know that 'grok' was also associated with
understanding the language to be able to explain what something was.)

I've had *way* too many incidents where the problem was simply
mis-communication because one group was using a term that
had a specific meaning to the other group with some other intended
meaning.  I even gave a talk last year on the problem:


http://www.igniteshow.com/videos/polysemous-terms-did-everyone-understand-your-message

And one of the presenters earlier that night touched on the issue,
for scientists talking to politicians and the public:


http://www.igniteshow.com/videos/return-jedis-so-what-making-your-science-matter


It takes more than just people skills to coordinate between the 
customers & the software people.*  Being able to translate between
the problem domain's jargon and the programmers (possibly via some
requirements language, like UML), or even just normalizing metadata
between the sub-communities is probably 25-50% of my work.

As a quick example, there's 'data' ... it means something completely
different if you're dealing with scientists, programmers, or
information scientists.  For the scientists, metadata vs. data is
a legitimate distinction as not all of what programmers would
consider 'data' is considered to be 'scientific data'.

-Joe

* http://www.youtube.com/watch?v=mGS2tKQhdhY


[CODE4LIB] Learning programming & data (was: You *are* a coder. So what am I?)

2013-02-15 Thread Joe Hourcle
On Feb 15, 2013, at 10:26 AM, Chris Gray wrote:

> Yes.  Exactly.  It's like saying you can't go to the doctor or hire a lawyer 
> without a bit of medical or law school.  Doctors and lawyers need to be able 
> to explain what they're doing.
> 
> Another skill that would be useful is understanding databases, by which I do 
> not mean learning SQL.  Too many people's idea of working with data is Excel, 
> which provides no structure for data. Type in any data in any box.  There is 
> none of the data integrity that a database requires.  Here my ideal is 
> "Database Design for Mere Mortals" which teaches no SQL at all but teaches 
> how to work from data you know and use and arrive at a structure that could 
> easily be put into a database.  It's not just data, but data structure that 
> needs to be understood.  I've seen plenty of evidence that people who build 
> commercial database-backed software don't understand database structure.


I don't know of one specifically for the library community, but there are some 
courses on the topic for the science community on learning how to use 
scientific databases, or to develop their own.

Two that I know well are Kirk Bourne at GMU and Peter Fox and his cohorts at 
RPI, and there's been an effort from the Federation of Earth Science 
Information Partners (ESIP) to put together short presentations on various 
related topics: 

http://classweb.gmu.edu/kborne/ 
http://tw.rpi.edu/wiki/Peter_Fox
http://wiki.esipfed.org/index.php/Data_Management_Short_Course


With the need for expertise in data management, there's also been a push to 
teach librarians in data curation & data management at Syracuse*, UIUC and 
recently started at UNC.

http://eslib.ischool.syr.edu/
http://cirss.lis.illinois.edu/CollMeta/dcep.html

http://sils.unc.edu/programs/graduate/post-masters-certificates/data-curation

And, another conference that I'm helping to organize, the Research Data Access 
and Preservation (RDAP) Summit, also being held in Baltimore this year (April 
4-5, co-located with the IA Summit)**.  It's been a place for the science, 
library and archives community to discuss issues (and solutions) that we're 
facing; it can be an interesting overview for librarians who are starting to 
look into the management of data.  See the 'Resources' page for links to 
articles summarizing past years & videos of the talks from last year.***

http://www.asis.org/rdap/


-Joe


* disclaimer : I gave an invited talk to one of the Syracuse eScience classes a 
couple of years back.

** I know, you're thinking, 'what idiot would be involved with organizing two 
events being held weeks apart?' ... but I'm not ...  I'm organizing three, so 
if you know any craft vendors who might be interested in participating in a 
street festival in Upper Marlboro, Maryland the day before Mother's Day : 
http://MarlboroughDay.org/ .  (yes, it's the Marlboro of tobacco & horse fame, 
but we don't have cowboys)

*** although, my talk's particularly bad, as I wasn't expecting to actually 
give it 'til two of my three speakers bowed out at the last minute.  But both 
Peter Fox & Kirk Borne spoke in other sessions, and lots of other interesting 
people.

ps. and um ... the thing about people making database software that don't 
understand data structures ... that's also part of my complaint about that 
project with people writing software that they shouldn't have ... storing 
journaled data in the same table, and no indexes so a RDBMS becomes a document 
store as there's only two useful accessors (one of which has to be checked to 
see if it's been deprecated by another record because of the journaling))


Re: [CODE4LIB] You *are* a coder. So what am I?

2013-02-15 Thread Joe Hourcle
On Feb 15, 2013, at 9:00 AM, Lin, Kun wrote:

> Wow, Interesting. But I am not fun of Perl. Is there other workshop?

I don't know of any full workshops in the area, but there are plenty
of monthly or semi-monthly meetings of different groups:

Python: http://dcpython.org/

R : http://www.meetup.com/R-users-DC/

Groovy: http://www.dcgroovy.org/

Drupal: http://groups.drupal.org/washington-dc-drupalers

Hadoop: http://www.meetup.com/Hadoop-DC/

Ruby:   http://www.dcrug.org/

ColdFusion: http://www.cfug-md.org/


For those not in this area, see:

http://www.pm.org/groups/
http://wiki.python.org/moin/LocalUserGroups
http://r-users-group.meetup.com/
http://groups.drupal.org/
http://www.ruby-lang.org/en/community/user-groups/
http://www.haskell.org/haskellwiki/User_groups
http://coldfusion.meetup.com/

-Joe


Re: [CODE4LIB] You *are* a coder. So what am I?

2013-02-15 Thread Joe Hourcle
On Feb 15, 2013, at 8:22 AM, Kyle Banerjee wrote:

> On Thu, Feb 14, 2013 at 7:40 AM, Jason Griffey  wrote:
> 
>> The vast, vast, vast, vast majority of people have absolutely no clue how
>> code translates into instructions for the magic glowing screen they look at
>> all day. Even a tiny bit of empowerment in that arena can make huge
>> differences in productivity and communication abilities
>> 
> 
> This is what it boils down to.
> 
> C4l is dominated by linux based web apps. For people in a typical office
> setting, the technologies these involve are a lousy place to start learning
> to program. What most of them need is very different than what is discussed
> here and it depends heavily on their use case and environment.
> 
> A bit of VBA, vbs, or some proprietary scripting language that interfaces
> with an app they use all the time to help with a small problem is a more
> realistic entry point for most people. However, discussion of such things
> is practically nonexistent here.

Well, as you mention that ... I'm one of the organizers of the 
DC-Baltimore Perl Workshop :

http://dcbpw.org/dcbpw2013/

Last year, we targeted the beginner's track as a sort of 'Perl
as a second language', assuming that you already knew the basic
concepts of programming (what's a variable, an array, a function,
etc.)

Would it be worth us aiming for an even lower level of expertise?

-Joe

ps.  Students & the unemployed are free ... $25 before March 1st,
 $50 after; will be April 20th at U. Baltimore.  We're also
 in talks with a training company to have either another track
 of paid training or a separate day (likely Sunday); they
 wouldn't necessarily be Perl-specific.


Re: [CODE4LIB] You *are* a coder. So what am I?

2013-02-14 Thread Joe Hourcle

On Thu, 14 Feb 2013, Jason Griffey wrote:


On Thu, Feb 14, 2013 at 10:30 AM, Joe Hourcle 
wrote:



Two, 'coding' is a relatively minor skill.  It's like putting 'typist' as
a job title, because you use your keyboard a lot at work.  Figuring out
what needs to be written/typed/coded is more important than the actual
writing aspect of it.



Any skill is minor if you already have it. :-)

As others have pointed out, learning even a tiny, tiny bit of code is a
huge benefit for librarians. The vast, vast, vast, vast majority of people
have absolutely no clue how code translates into instructions for the magic
glowing screen they look at all day. Even a tiny bit of empowerment in that
arena can make huge differences in productivity and communication
abilities. Just understanding the logic behind code means that librarians
have a better understanding of what falls into the "possible" and
"impossible" categories for "doing stuff with a computer" and anything that
grounds decision making in the possible is AWESOME.


It's true ... and learning lots of different programming languages makes 
you think about the problem in different ways*


But equally important is knowing that's it's just one tool.  It's like the 
quote, 'when you have a hammer, everything's a nail'.


... and more often than people realize, the correct answer is not to write 
code, or to write less of it.


I remember once, I had inherited a project where they were doing this 
really complex text parsing, and we'd spend a month or so of man-hours on 
it each year.  My manager quit, so I got to meet with the 'customer'.** 
I told her some of the more problematic bits, and some of them were things 
that she hadn't liked, so used it to push back and get things changed
upstream.  The next year, I was able to shave a week off the turn-around 
time.


For the last few years, I've been dealing with software that someone 
wrote when what they *should* have done was survey what was out there, and 
figure out which one met their needs, and if necessary, adapt it slightly. 
Instead, they wrote massive complex systems that was unnecessary.  And now 
we've got to support it, as there isn't the funding to convert it all over 
to something that has a broad community of support.


(and I guess that's one of my issues against 'coders' ... anyone who 
writes code should be required to support it, too ... I've done the 
'developer', 'sysadmin' and 'helpdesk' roles individually ... and when 
some developer makes a change that causes you to get 2am wakeup calls when 
the server crashes every night for two weeks straight,*** but they of 
course can't roll back, because 'but it's in production now, as it passed 
our testing'.)


-Joe

ps.  I like Stuart's 'Library Systems Specialist' title for those who
 actually work in libraries.

pps. Yes, I should actually be writing code right now.


* procedural, functional, OO, ... I still haven't wrapped my head around
  this whole 'noSQL' movement, and I used to manage LDAP servers and
  *love* heirarchical databases.  (even tried to push for its use in our
  local registry ... I got shot down by the others on the project).

** we were generating an HTML version of the schedule of classes based on
   the export generated from QuarkXPress, which was used to typeset the
   book.  The biggest problem was dealing with a department code that had
   an ampersand in it, and the hack that we did to the lexer to deal with
   it doubled the time of each run.  (and they made enough changes
   year-to-year that the previous year's script never worked right out the
   bat, so we'd have to run it, verify, tweak the code, re-run, etc.)

*** they never actually fixed the problem.  I put in (coded?) a watchdog
script that'd check every 60 sec. if ColdFusion was down, and if so,
start it back up again.  So only the times when the config got
corrupted did I have to manually intervene.  By the time I was fired
(long story, unrelated), it was crashing 5-10 times a day.


  1   2   3   >