from:"Kyle Banerjee"

Re: [CODE4LIB] date fields

2016-07-11 Thread Kyle Banerjee

Is the idea that this new field would be stored as MARC in the system (the
ILS?).

If so, the 9xx solution already suggested is probably the way to go if the
008 route suggested earlier won't work for you. Otherwise, you run a risk
that some form of record maintenance will blow out all your changes.

The actual use case you have in mind makes a big difference in what paths
make sense, so more detail might be helpful.

kyle



On Mon, Jul 11, 2016 at 1:06 PM, Jonathan Rochkind 
wrote:

> There's some super useful data in the MARC fixed fields too -- more useful
> than the semi-transcribed values in 260c, although it's also a pain to
> access/transform to something reasonably machine actionable.
>
> Here's the code from traject that tries to get a reasonable date out of
> marc fixed fields, falling back to 260c if it needs to.
>
> https://github.com/traject/traject/blob/e98fe35f504a2a519412cd28fdd97dc514b603c6/lib/traject/macros/marc21_semantics.rb#L299-L379
>
> There are already quite a few places in MARC for dates. It's just they're
> all weird. You're making up yet a new kind of date to your own local
> meaning and specs. I doubt there's an existing MARC field you can put it in
> where it won't just add to the confusion. (obligatory reference to
> https://xkcd.com/927/).
>
> I'd just put it in a 9xx or xx9 field of your choosing, they are reserved
> for local use.
>
> On Mon, Jul 11, 2016 at 3:19 PM, Joy Nelson 
> wrote:
>
> > Hi Eric-
> > Are you planning on storing the 'normalized' dates for ever in the MARC?
> > i.e. leave the c1900 in the 260$c and have 1900 in another place?
> >
> > I think what you do depends on your ILS and tools.  My first reaction
> would
> > be to stash the date in an unused subfield in the 260.  If your system
> > allows you to add 'non standard' subfields, you could use 260$z to stash
> > it.
> >
> > But, then I start to think that might rankle some catalogers to have 'non
> > standard' date data in the 260 (or 264).  I would probably then look at
> > using one of the local use tags.  901-907, 910, or 945-949.  You could be
> > the date in $a and even a brief description in a second subfield.
> > 901$a1900$bnormalized date for project XYZ -initials/date
> >
> > -Joy
> >
> > On Mon, Jul 11, 2016 at 12:51 PM, Eric Lease Morgan 
> > wrote:
> >
> > > I’m looking for date fields.
> > >
> > > Or more specifically, I have been given a pile o’ MARC records, and I
> > will
> > > be extracting for analysis the values of dates from MARC 260$c. From
> the
> > > resulting set of values — which will include all sorts of string values
> > > ([1900], c1900, 190?, 19—, 1900, etc.) — I plan to normalize things to
> > > integers like 1900. I then want to save/store these normalized values
> > back
> > > to my local set of MARC records. I will then re-read the data to create
> > > things like timelines, to answer questions like “How old is old?”, or
> to
> > > “simply” look for trends in the data.
> > >
> > > What field would y’all suggest I use to store my normalized date
> content?
> > >
> > > —
> > > Eric Morgan
> > >
> >
> >
> >
> > --
> > Joy Nelson
> > Director of Migrations
> >
> > ByWater Solutions 
> > Support and Consulting for Open Source Software
> > Office: Fort Worth, TX
> > Phone/Fax (888)900-8944
> > What is Koha? 
> >
>

Re: [CODE4LIB] Formalizing Code4Lib?

2016-06-14 Thread Kyle Banerjee

On Tue, Jun 14, 2016 at 9:05 AM, Miles Fidelman 
wrote:

> I'm rather surprised that nobody has suggested contacting:
> - the American Library Association (particularly the LITA division)
> - the Internet Archive
>
> Or... the Tides Foundation (tides.org in San Francisco) has been known to
> act as fiscal agent and "umbrella" for small non-profit projects/groups.
>
> Or... maybe even the Apache Software Foundation or FSF.

Even if another organization is willing to serve in this capacity, it is
essential to understand exactly what that means. How independent would c4l
be under the arrangement? Would the relationship alter the nature of c4l
itself?

For example, if LITA steps up, would people need to be LITA members to
attend events? Even if they don't have to be, would there be a shift in
participation? How much say would LITA have over format, policies, etc?

One of the challenges of fundraising for c4l meetings is a lot of people
and companies (understandably) want to earmark their donation regardless
what is actually needed. Presumably anyone willing to take on to take on
much greater financial and administrative headaches will attach some
strings.

There are real advantages to working with other organizations, but there
are downsides as well.

kyle

Re: [CODE4LIB] Formalizing Code4Lib?

2016-06-07 Thread Kyle Banerjee

On Tue, Jun 7, 2016 at 2:59 PM, Salazar, Christina <
christina.sala...@csuci.edu> wrote:

> Having gone to C4L in 2007 in Athens, when it was I think 150 people (ha!
> Let's be honest: 145 men and 5 women) and then again in 2015 in Portland
> and 2014 in Raleigh, the Code 4 Lib that once was is no more. Long live
> Code4Lib.
>
> If we continue to want a large conference we need a better fiduciary
> agent. Take the fact that so few folks are willing to put bids to host as a
> sign that something different is happening here from what used to be 10
> plus years ago. (Wait, damn! Am I THAT old???)
>
> I'm not saying that all the changes that have happened over time have been
> bad (see my observation of gender balance above) but I think the large
> annual conference specifically needs to be thought through.
>
> How do we approach thinking it through? I have no idea but as others have
> said, the conversation is long overdue. (I wonder when Ruth says "Clearly
> the community wants to go" WHAT "the community" wants to go TO? Would we
> even be able to come to an agreement on that?)
>

This.

My recollection is that in the bad 'ol days, c4l was much more about
sharing ideas to solve practical problems. The conference was like that too
-- people sometimes delivered lightning talks based on ideas that popped
into their heads by a presentation that had just been given. There was a
lot more nitty gritty tech in the offline fun. Getting involved was simply
a matter of showing up. The conference was a chance to get together with
people you'd been working with remotely.

Nowadays, the conference (which has become like other library conferences)
has become an end in itself. It seems to take more energy than everything
else combined and the lion's share of the messages on this list are
announcements or administrative in nature. Communication has shifted from a
hive mind dynamic where everyone contributes towards one where a few push
information out to the many -- this presents barriers to participation and
contributes to people feeling like outsiders.

Both c4l and the conference have changed a great deal over the years, and
whatever path we continue on deserves some discussion. The worst case
scenario is that we don't reach an agreement on how to proceed and things
break into smaller pieces. That wouldn't be the so bad because there is
plenty of great action to be had in smaller and regional venues.

kyle

Re: [CODE4LIB] JPEG question

2016-05-25 Thread Kyle Banerjee

Is that 1524 dpi for Batch A a misprint? If not, that's very likely to be
your problem -- I doubt that's what the vendor really scanned at.

If you change the dpi values and try to reload, my guess is you'll get very
different results.

kyle

On Wed, May 25, 2016 at 2:40 PM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> We have had some test scans made of a pressed flower album, and are
> mightily puzzled by the difference in quality when we process the resulting
> JPGs through BookReader. There are 2 batches, each taken by different 3rd
> parties.
>
> For Batch A, the original JPGs are ~3-4MB each, 1524 dpi, 24 bit depth.
> When passed through the BookReader, the resulting JPGs are ~20-25KB, 150
> dpi, 24bit depth.
>
> For Batch B, the original JPGs are ~700KB-1MB each, 200 dpi, 24 bit depth.
> When passed through the very same BookReader, the resulting JPGs are
> ~80-85KB, 150 dpi, 24 bit depth.
>
> For Batch A, the quality when viewing in the BookReader is atrocious. Not
> surprising, I guess, given the resulting files are so small. Batch B comes
> up 'OK' in the BookReader.
>
> What I can't figure out is why is it that Batch A comes out most poorly
> after being processed by BookReader, given the original file sizes and
> resolution are so much larger than Batch B. Can anyone shed any light?
>
> When I look up the exif data of the files, Batch A has a compression
> factor of 6; while batch B is 1. Could this have anything to do with it?
>
> Thanks.
>
> Bernadette Houghton
> Digitisation and Preservation Librarian
> Library
> [Title: Deakin University logo]
> Deakin University
> Locked Bag 2, Geelong, VIC 3220
> +61 3 52278230
> bernadette.hough...@deakin.edu.au >
> http://orcid.org/-0001-5730-8805
> www.deakin.edu.au
> Deakin University CRICOS Provider Code 00113B
>
>
> Important Notice: The contents of this email are intended solely for the
> named addressee and are confidential; any unauthorised use, reproduction or
> storage of the contents is expressly prohibited. If you have received this
> email in error, please delete it and any attachments immediately and advise
> the sender by return email or telephone.
>
> Deakin University does not warrant that this email and any attachments are
> error or virus free.
>

Re: [CODE4LIB] Anything Interesting Going on in Archival Metadata?

2016-05-24 Thread Kyle Banerjee

On Tue, May 24, 2016 at 6:57 AM, Matt Sherman 
wrote:

>   Is linked data even useful in a setting with extremely unique
> materials? 

IMO, linked data is especially useful with unique materials because
relationships are simultaneously more important and more difficult to trace.

Having said that, archival collections often have minimal access points. As
a practical matter, this means that the challenge has more to do with
building the knowledgebase and creating the access points than implementing
any particular standards/technology.

kyle

Re: [CODE4LIB] All URLs redirect to mod_rewrite error page

2016-05-09 Thread Kyle Banerjee

Howdy Justin, 

We don't have enough info to diagnose what's going on,  but DB corruption 
strikes me a more likely cause of your headaches than OS or Apache issues.

Based on your description, it sounds like Omeka thinks it is not properly 
installed -- i.e. mod_rewrite is probably fine. Before tearing your hair out 
messing with Apache, I'd recommend a performing a full DB recovery and 
verifying the connection since that will only take a few min. If that doesn't 
help, do a full Omeka restore (DB and files) presuming you don't just have an 
easy machine snapshot you can recover. Worst case scenario is to fall back on 
your bare metal recovery procedure.

Kyle

> On May 9, 2016, at 1:29 PM, Justin Snow  wrote:
> 
> I have a few platforms (domain.com/omeka, domain.com/wordpress, 
> domain.com/omeka-s) living alongside each other on an Azure cloud server 
> running Ubuntu 14.04.4 and Apache 2.4. A few days ago, I tried installing an 
> additional platform, but doing so seemed to break everything else on the 
> server, so I removed it and reinstalled Apache. After editing the .htaccess 
> file to remove a typo, everything worked. I left for the weekend, came back 
> to work this morning, and everything is broken again.
> 
> By broken, I mean every URL on the server redirects to a mod_rewrite 
> installation error page for Omeka (domain.com/install). Omeka has already 
> been installed and running for months. MR needs to be enabled for Omeka to be 
> installed and function properly. MR is definitely enabled.
> 
> 
> $ sudo a2enmod rewrite
> Module rewrite already enabled
> 
> 
> I'm assuming this is a problem with either the /var/www/.htaccess file or the 
> /etc/apache2/apache2.conf file. Here's the relevant .htaccess code that is 
> default for Omeka:
> 
> 
> RewriteEngine on
> 
> RewriteCond %{REQUEST_FILENAME} -f
> RewriteRule !\.php$ - [C]
> RewriteRule .* - [L]
> 
> RewriteRule ^install/.*$ install/install.php [L]
> RewriteRule ^admin/.*$ admin/index.php [L]
> RewriteRule .* index.php
> 
> 
> And here's the relevant(?) apache2.conf code:
> 
> 
> 
>Options FollowSymLinks
>AllowOverride All
>Require all denied
> 
> 
> 
>AllowOverride None
>Require all granted
> 
> 
> 
>Options Indexes FollowSymLinks MultiViews
>AllowOverride All
>Require all granted
> 
> 
> AccessFileName .htaccess
> 
> 
> If I remove the .htaccess file, or if I just remove some of RewriteRules, 
> everything on the server is inaccessible (including Wordpress & Omeka-S). I'm 
> clearly missing something. Any ideas?

Re: [CODE4LIB] using drupal for a document repository

2016-05-06 Thread Kyle Banerjee

> On May 6, 2016, at 8:37 AM, Joshua Klingbeil  wrote:
> 
> ...These, and other req-cons can help you to better understand what type of
> investment should be considered for your project...  However, going through 
> the process may help you to determine if it feels more like you're trying to 
> cram your
> needs into a specific product...

This.

One factor that is especially important for smaller institutions is what skill 
sets are required. Solutions that utilize technologies you already support are 
preferable because it's hard to do things well if you're spread too thin. Plus, 
there will be heck to pay if you depend on individuals with unique combinations 
of skills as they will eventually leave through the natural passage of time.

Having said that, no matter which route you take, you'll need a good way to 
migrate things in and out. If you can do that, you'll be safe.

Kyle

Re: [CODE4LIB] Form fill from URL

2016-04-25 Thread Kyle Banerjee

On Fri, Apr 22, 2016 at 4:58 PM, Teague Allen 
wrote:

> Hello collective,
>
> I've been given the opportunity to replace a much-detested PDF form used
> to request cataloging for items by our researchers that are published
> outside our organization. My hope is to create a web form that will
> automatically populate with title, author(s), and appropriate citation
> information if a URL/URI is entered.

This really depends on the URL. My concern would be that asking someone to
provide a URL is more onerous than filling out citation information because
the latter is error tolerant and can be done from memory while anyone who
can provide a URL will already have/know all the relevant citation info

If I understand your need correctly, I'd consider listing ISBN/ISSN at the
top of your form as an optional field. When a javascript event detects a
complete number has been entered, use your method of choice to retrieve the
other info and fill unpopulated fields. Even that wouldn't be great since
just presenting ISN as optional is tantamount to asking people to look up
this info and key it in -- which takes more time than keying in an author
and title. However, it could lead to more complete forms and reduce
confusion for everyone.

Before doing anything, I'd make sure I understood why this form is so
unpopular and address those issues. Otherwise, you may wind up with another
unpopular form that just happens to have a little more automation built
into it.

kyle

Re: [CODE4LIB] Good Database Software for a Digital Project?

2016-04-16 Thread Kyle Banerjee

On Sat, Apr 16, 2016 at 7:15 AM, Matt Sherman 
wrote:

> Thanks for all the advice folks, this gives me a lot to look into.  You all
> have certainly made me table MySQL, so now to look into PostgreSQL, Solr,
> XTF, and some of these other technologies to see what would be the best
> fit.
>

For a project like this, you can make just about any solution work.

Maintaining apps that depend on technologies you don't use for anything
else is a pain, so I'd be inclined to avoid the overhead of learning
something new for this project unless is a part of your long term
objectives.

For example, solr or postgres are both viable here, but those two
applications are good for very different types of purposes. So if you use
one of those, pick the one that will be the most useful down the road.

kyle

Re: [CODE4LIB] Good Database Software for a Digital Project?

2016-04-15 Thread Kyle Banerjee

On Fri, Apr 15, 2016 at 11:53 AM, Roy Tennant  wrote:

> In my experience, for a number of use cases, including possibly this one,
> a database is overkill. Often, flat files in a directory system indexed by
> something like Solr is plenty and you avoid the inevitable headaches of
> being a database administrator. Backup, for example, is a snap and easily
> automated.
>

I'm with Roy -- no need to use a chain saw to cut butter.

Out of curiosity, since the use case is an annotated bibliography, how much
stuff do you have? If you have only a few thousand entries in delimited
text, flat files could be easier and more effective than other options.

kyle

Re: [CODE4LIB] authority work with isni

2016-04-15 Thread Kyle Banerjee

On Fri, Apr 15, 2016 at 2:16 AM, Eric Lease Morgan  wrote:

> ...
> My questions are:
>
>   * What remote authority databases are available programmatically? I
> already know of one from the Library of Congress, VIAF, and probably
> WorldCat Identities. Does ISNI support some sort of API, and if so, where
> is some documentation?
>

Depends on what you have in mind. For databases similar to your example, I
believe ORCID has an API. GNIS, ULAN, CONA, and TGN might be interesting to
you, but there are tons more, particularly if you add subject authorities
(e.g. AAT, MeSH). The Getty stuff is all available as LoD.

  * I believe the Library Of Congress, VIAF, and probably WorldCat
> Identities all support linked data. Does ISNI, and if so, then how is it
> implemented and can you point me to documentation?
>
>   * When it comes to updating the local (MARC) authority records, how do
> you suggest the updates happen? More specifically, what types of values do
> you suggest I insert into what specific (MARC) fields/subfields? Some
> people advocate $0 of 1xx, 6xx, and 7xx fields. Other people suggest 024
> subfields 2 and a. Inquiring minds would like to know.
>

Implementation would be specific to your system and those you wish to
interact with. The MARC record is used to represent/transmit data, but it
doesn't actually exist in the sense that systems use it internally as is.

Having said that, I think the logical place to put control numbers from
different schema is in 024 because that field allows you to differentiate
the source so it doesn't matter if control numbers overlap

kyle

Re: [CODE4LIB] Google can give you answers, but librarians give you the right answers

2016-04-10 Thread Kyle Banerjee

On Fri, Apr 8, 2016 at 5:04 PM, Karen Coyle  wrote:

> The percentage of things that have decent LCSH assigned to them is
>> small
>> and shrinking for the simple reason that a fewer and fewer humans have to
>> manage more resources.
>>
>
> I'm not sure what you are saying here -- that there are fewer headings
> being assigned, or that they are not as "good" as ones assigned in the
> past? Or is it that many of our resources aren't covered by library
> cataloging rules?
>

All of the above. The number of resources continues to grow, fewer people
assign subject headings, and the amount of training those people have
declines. The norm nowadays is for libraries to perform little to no
original cataloging themselves. Especially vendor created record sets are
full of records that lack any LCSH headings, let alone good ones.

.

> LCSH is relatively flat, the rules for constructing headings are so
>> Byzantine that they stymie even experienced catalogers (which contributes
>> to inconsistent application in terms of quality, level of analysis, and
>> completeness), and its ability to express concepts at all is highly
>> variable as it is designed by a committee on an enumerative basis.
>>
>
> ?? Sorry, what's this "enumerative basis"?
>

LCSH is based on literary warrant, meaning that a subject doesn't exist
until needed for an actual item in front of someone at the Library of
Congress or a SACO library. Rather than relate things in a conceptual
universe, LCSH expands on an ad hoc basis.
http://www.loc.gov/aba/pcc/saco/sacogenfaq.html describes the basic
process. LCSH isn't a bad general vocab, but it's not good in specialized
areas because neither the catalogers nor those creating headings have the
expertise to assign helpful subjects. Patrons and staff where I work regard
the headings as unhelpful noise so we don't display LCSH facets (MeSH is a
better fit for our needs).

>   Add to this that concepts in records frequently must be expressed across
>> multiple
>> headings and subheadings, any type of automated assignment is going to
>> result in really "dirty" relationships so I can't blame ILS designers for
>> limiting their use of LCSH primarily to controlled keyword access.
>>
>
> Well, actually, there's nothing at all "controlled" about keyword access.
> It's pretty much a pot shot, or, as I've called it before, a form of
> dumpster diving for information. There is a huge disconnect between the
> results of keyword searching and the expected functionality (read: user
> service) of controlled left-anchored headings, and I continue to be amazed
> that we've been living with this disconnect for decades without ever coming
> to an agreement that we need a solution.[1] Instead, great effort goes into
> modifying the descriptive cataloging rules, while no major changes have
> been made in subject access. I find this to be... well, stunning, in the
> sense that I'm pretty much stunned that this is the case.

People like the imagery of choosing from a browse list, but they're not
going to guess the left anchored headings because the preferred terminology
and word order often won't be the same as what they're thinking. When they
do type everything right, the absolutely insane numbers of unique
precoordinated subject strings returned would be overwhelming unless there
is little on the subject they seek.

I agree that using subject cataloging rules designed for for filing paper
cards in a computerized environment is insane. But even if the rules were
updated, fixing existing records or creating rich relationships between
LCSH terms would both be impossible. The most practical thing to do is to
do a keyword search on the headings and then return facets based only on
650 $a (i.e. and ignore the rest)   -- which is what most catalogs do.

kyle

Re: [CODE4LIB] Software used in Panama Papers Analysis

2016-04-08 Thread Kyle Banerjee

On Fri, Apr 8, 2016 at 8:13 AM, Jenn C  wrote:

> I worked on a text mining project last semester where I had a bunch of
> magazines with text that was totally unstructured (from IA). I would have
> really liked to know how to work entity matching into such a project. Are
> there text mining projects out there that demonstrate doing this?
>

What did you use for entity identification? My gut reaction would be to
look at what the entity extractor pulled out and then normalize the source
in the hopes of improving the accuracy. Even when controlled vocab is not
used, normalizing data makes a massive difference.

 I am curious as to what the data for the Panama Papers looked like going
in. I would think significant normalization and structuring would be
necessary to leverage the advantages of using Blacklight over other tools.

kyle

Re: [CODE4LIB] including data from static JSON file in Javascript

2016-04-06 Thread Kyle Banerjee

If all you want to do is load external json as a string, you can do it
using syntax almost identical to what you suggest. Just change your
data.json file so the content is

var data = ' [include your json here, be sure to escape things properly]';

Then just load this file before your external script e.g.:

Within external_script.js, you can reference the data variable just as you
would have had it been defined in external_script.js.

Depending on what you're doing with your json, it may or may not be a good
approach, but it will work.

kyle

On Wed, Apr 6, 2016 at 6:02 PM, Ken Irwin  wrote:

> Hi folks,
>
> I'm working on a javascript project that currently has a bunch of JSON
> data defined inside the script; I'd like to move the data to a file outside
> the JS file so it can be updated without touching the script, but I'm
> running up against a few challenges externalizing the data.
>
> The static JSON file lives in the same directory with the script.
>
> If I had my druthers, I'd to it PHP style, but I don't think JS works this
> way:
> =
> External file:
> [all this JSON]
>
> Script:
> var data = include ('data.json');
> 
> All the options I find for loading external files are all AJAX-y, whereas
> what I really want is something synchronous - the script doesn't go on
> until the data loads.
> I've also had some lexical scope issues where I can get the data inside
> the getJSON() function, but then have trouble transporting the data out of
> that function into the rest of the script.
>
> Does anyone know of a good way to accomplish this? I imagine there's some
> incantation that I can perform, but I'm struggling to find it.
>
> Thanks,
> Ken
>

Re: [CODE4LIB] Google can give you answers, but librarians give you the right answers

2016-04-06 Thread Kyle Banerjee

On Wed, Apr 6, 2016 at 7:42 AM, Karen Coyle  wrote:

> ... Libraries "do" it, but our user interfaces ignore it (honestly, does
> anyone NOT think that the whole BT/NT relationship in LCSH is completely
> wasted in today's systems?).  Google searches "work" best on proper nouns
> that are nearly unique. You cannot do concept searches, and you cannot see
> relationships between concepts. It's great for named people, organizations
> and products, but not great for anything else.[1]...

Conceptually, I like the idea of using the relationships in LCSH. However,
I don't hold out much hope that anyone will make hay out of that.

The percentage of things that have decent LCSH assigned to them is small
and shrinking for the simple reason that a fewer and fewer humans have to
manage more resources. Automation could help (getting the needed data from
publishers might be tricky), but the only benefit I can think of for using
LCSH for automated applications is to maximize relationships with older
materials -- possibly at the expense of the "findability" of the newer
stuff.

LCSH is relatively flat, the rules for constructing headings are so
Byzantine that they stymie even experienced catalogers (which contributes
to inconsistent application in terms of quality, level of analysis, and
completeness), and its ability to express concepts at all is highly
variable as it is designed by a committee on an enumerative basis. Add to
this that concepts in records frequently must be expressed across multiple
headings and subheadings, any type of automated assignment is going to
result in really "dirty" relationships so I can't blame ILS designers for
limiting their use of LCSH primarily to controlled keyword access.

kyle

Re: [CODE4LIB] Google can give you answers, but librarians give you the right answers

2016-04-01 Thread Kyle Banerjee

On Thu, Mar 31, 2016 at 9:31 PM, Cornel Darden Jr.  wrote:

>
> "Google can give you answers, but librarians give you the right answers."
>
> Is it me? Or is there something wrong with this statement?
>

There's nothing wrong with the statement. As is the case with all sound
bites, it should be used to stimulate thought rather than express reality.

Librarians have a schizophrenic relationship with Google. We dump on Google
all the time, but it's one of the tools librarians of all stripes rely on
the most. When we build things, we emulate Google's look, feel, and
functionality. And  while we blast Google on privacy issues, human
librarians know a lot about what the individuals they serve use, why, and
how -- it is much easier to get anonymous help from Google than a librarian.

There are many animals in the information ecosystem, libraries and Google
being among them. Our origins and evolutionary path differ, and this
diversity is a good thing.

kyle

Re: [CODE4LIB] Public Health Metadata

2016-03-20 Thread Kyle Banerjee

BTW, I hope you share the solution you decide to implement.

Public health research goes on at a lot of institutions (including mine),
and I'm always looking for ways to address weaknesses in our current
practices/systems.

kyle

On Mon, Mar 14, 2016 at 11:43 AM, Jacob Ratliff 
wrote:

> MeSH is a little helpful, but it is slightly different than the realm of
>  public health, which spends a lot of time on the systems surrounding
> health, as well as the health areas themselves. (e.g. Pharmacy supply chain
> management). That's the direction I'm heading though!
>
> Jacob
>
> On Mon, Mar 14, 2016 at 2:35 PM, Carol Bean  wrote:
>
> > MeSH?
> >
> > Sent from my iPhone
> >
> > > On Mar 14, 2016, at 1:22 PM, Jacob Ratliff 
> > wrote:
> > >
> > > Hi all,
> > >
> > > I currently work in an International public health non-profit, and we
> > are setting up enterprise wide document management for dealing with
> > Knowledge Management and Information Management issues. Lots of moving
> > pieces, but I wanted to get some input on metadata specific to the
> > medical/health world. I am looking for some metadata guidance
> specifically
> > related to the medical/health world. Is anyone using any standard
> > controlled vocabularies? Should I be looking into Linked Data? I'm
> starting
> > off the research phase for all of the metadata, so links to resources and
> > case studies is greatly helpful!
> > >
> > > Bonus points to anything that is international in scope, as over 75% of
> > the employees at my company are non-US based (most of them in Africa).
> > >
> > > Thanks,
> > >
> > > Jacob Ratliff
> > > Information Architect / UX Specialsit
> > > Management Sciences for Health
> > > jaratlif...@gmail.com
> >
>

Re: [CODE4LIB] Do you use alt tags in your images for digital collections

2016-03-19 Thread Kyle Banerjee

On Thu, Mar 17, 2016 at 4:39 PM, Erica FINDLEY  wrote:

> Good evening,
>
> We are currently experiencing a dilemma with alt tags in our digital
> collections.
>
> We would like to include alt tags to be in compliance with accessibility
> guidelines.
>
> When looking at an item detail page
> , there is a lot
> of surrounding metadata to help visualize the image, but on our search
> results  pages, that detail is
> not present. Currently a screen reader is not reading the titles of the
> images on our search results page.
>
> We are able to add alt tags to the image to help with this. Our dilemma is
> what those tags should be so they are not redundant of either the title or
> description metadata, but still helpful.
>

Short answer: Give Talking Books and Braille Services at the Oregon State
Library or the Disability Resource Center at PSU a buzz. Either of those
outfits can hook you up. You need specific advice about how your site can
deliver the best service for blind people rather than general accessibility
info.

Longer answer : As someone who worked and lived with blind people for a
number of years, my personal reactions when I viewed the site with my eyes
rather than a screen reader were the following:

   - If screen readers don't read the title, it's good idea to figure out
   why and address that. Screen readers can be thought of like browsers -- you
   need to make sure the most popular ones work with your site.

   Once the title is properly read, put in a blank alt so blind patrons
   don't get slowed down by an element that does nothing for them and connects
   them to a viewer they can't use. If they want the image, there is already a
   clearly marked download button.

   Accessibility guidelines say to put in alt, but blind people I've known
   tell me adding useless elements that slow things down and interrupt the
   flow is worse than doing nothing. If the title is read and they can
   download the image, the function of the alt for that picture is already
   covered.

   - If you can't figure out how to make the title visible to major screen
   readers, put the title in the alt. It will be obvious, help with search
   engine optimization, and it doesn't cause problems for other users.

   - I would definitely talk to a couple blind people to get their
   reactions. Photographs are inherently visual, and understanding how they
   might use this site is critical to improving it. Blind people do a lot of
   things sighted people don't imagine they would, but there are other things
   make no sense for them. For example, finding blind people who love movies
   and TV is easy, but I have yet to meet one that enjoyed cartoons because
   the medium is inherently visual and the sound makes no sense.

kyle

Re: [CODE4LIB] Public Health Metadata

2016-03-15 Thread Kyle Banerjee

Howdy Jacob,

One thing you'll want to consider in choosing a vocabulary is to find one
that's optimized for purposes/topics similar to yours.

For example, SNOMED is designed to provide standardized terminology for
storing/retrieving information from clinical care EHRs, ICD-10 is for
reporting out for statistical purposes, and MeSH was designed for indexing
articles, books, documents, etc in the life sciences. There are other
vocabularies designed for other uses.

These three schemes all have hierarchical and/or synthetic codes in the
background (with ICD-10, the codes are actually in the foreground with the
vocab in the background) that lend themselves towards relating concepts in
an internationalized database of resources in many languages -- SNOMED
specifically has an international version. There are good tools for these
schemes (you'll even find ones that assign terms based on textual
analysis), and it's easy to get the data so you can create your own tools
if needed.

If the idea is that that the values of Target Populations are kept separate
from Technical Area and Expertise Area? If so, you might consider using
more than one vocabulary or even to use your own hierarchy. For example,
you may find that no vocabulary expresses Target Population the way you
need and there probably few enough terms that it would be feasible to
maintain your own hierarchy for that area.

kyle

On Tue, Mar 15, 2016 at 6:51 AM, Jacob Ratliff <jaratlif...@gmail.com>
wrote:

> All good questions, and most of which we are still in the process of
> determining. The types of documents are generally project related
> documentation (reports, plans, technical information, etc.), but those have
> not yet been standardized. We are also in the process of doing business
> analysis and user testing to determine the types and amount of metadata we
> need. I just know for sure that we will need "subject" or "topic"
> descriptors for all of the content. This has currently been narrowed down
> to "Technical Area" (i.e. the actual health issue), target populations
> (e.g. Newborns), and Expertise Areas (e.g. Monitoring and evaluation), but
> all of that is subject to change based on research and user testing.
>
> As for the process to assign terms and the structure of the terms, all of
> that is up in the air right now as we have no systems (business or
> technological) to implement any of this. In an ideal world flush with
> resources, I would be able to get a metadata/vocabulary management tool, as
> well as a robust document/content management system that can work together,
> as well as integrate with our other data and business intelligence systems
> that are being created. There are currently a number of evaluative/business
> analysis workstreams moving forward to try and answer some of these
> questions. In reality, there is a very good chance a lot of this will need
> to be managed through excel and good governance.
>
> The systems and vocabularies will also have to useful worldwide, as 80% of
> our employees are located outside of the US (we aren't even talking about
> multiple languages right now; that's a problem that is not even worth
> considering at this point).
>
> The suggestions everyone has given so far are very helpful and are putting
> me on the right track. My hope is to be able to use one of them, or at
> least part(s) of them to get to where we need to go.
>
> Hopefully in a few months I will have a good update (and more questions) on
> where we are!
>
> Thanks,
>
> Jacob
>
> On Mon, Mar 14, 2016 at 4:55 PM, Kyle Banerjee <kyle.baner...@gmail.com>
> wrote:
>
> > Could you say a bit more about the documents you need to manage, the
> level
> > of specificity you need, how they'll be used, and what process you
> envision
> > to assign terms? If your documents are mostly clinical in nature, SNOMED
> > strikes me a good choice, but if you want terminology that could take you
> > to related articles in PubMed or your needs aren't mostly clinical, MeSH
> > might work better.
> >
> > It's possible to crosswalk across vocabularies, but the different
> > vocabularies are optimized to support different needs so you'll want to
> > pick one that's appropriate for your use.
> >
> > kyle
> >
> > On Mon, Mar 14, 2016 at 11:22 AM, Jacob Ratliff <jaratlif...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I currently work in an International public health non-profit, and we
> are
> > > setting up enterprise wide document management for dealing with
> Knowledge
> > > Management and Information Management issues. Lots of moving pieces,
> but
> > I
> > > wanted to get some input on metadata spe

Re: [CODE4LIB] Public Health Metadata

2016-03-14 Thread Kyle Banerjee

Could you say a bit more about the documents you need to manage, the level
of specificity you need, how they'll be used, and what process you envision
to assign terms? If your documents are mostly clinical in nature, SNOMED
strikes me a good choice, but if you want terminology that could take you
to related articles in PubMed or your needs aren't mostly clinical, MeSH
might work better.

It's possible to crosswalk across vocabularies, but the different
vocabularies are optimized to support different needs so you'll want to
pick one that's appropriate for your use.

kyle

On Mon, Mar 14, 2016 at 11:22 AM, Jacob Ratliff 
wrote:

> Hi all,
>
> I currently work in an International public health non-profit, and we are
> setting up enterprise wide document management for dealing with Knowledge
> Management and Information Management issues. Lots of moving pieces, but I
> wanted to get some input on metadata specific to the medical/health world.
> I am looking for some metadata guidance specifically related to the
> medical/health world. Is anyone using any standard controlled vocabularies?
> Should I be looking into Linked Data? I'm starting off the research phase
> for all of the metadata, so links to resources and case studies is greatly
> helpful!
>
> Bonus points to anything that is international in scope, as over 75% of
> the employees at my company are non-US based (most of them in Africa).
>
> Thanks,
>
> Jacob Ratliff
> Information Architect / UX Specialsit
> Management Sciences for Health
> jaratlif...@gmail.com
>

Re: [CODE4LIB] php and email

2016-02-26 Thread Kyle Banerjee

>
> Our library has a website run on PHP.  The university IT would not help to
> set up email capability via Web.  My question is, what are the options
> there that I can add email notification capability to our website, and how?
>
> Our server is Windows 2008r2, PHP5.6, IIS 7.5.
>

Does university IT know  you intend to run your own mail server and are OK
with your intended use? If not, you might want to touch base to ensure you
don't find yourself with a blocked service or worse.

kyle

Re: [CODE4LIB] Listserv communication, was RE: Proposed Duty Officer

2016-02-26 Thread Kyle Banerjee

> You're also always going to have trouble with getting people to ask
> questions, unless the concept of asking for help/guidance has been drilled
> into them as not stupid, but constructive, for a very long time. I'm
> talking life span.
>

Responses people expect are also a barrier to participation. Multiple
people have told me offline they don't ask for help because the answers
make them feel dumb. They know people don't mean to make them feel that
way, but it's still an issue. Especially for newer members, answers that
use excessive jargon, require skills/knowledge not inherent to the question
to make sense, dismiss approaches/suggestions as wrong, or push solutions
that involve steep learning curves discourage discussion.

Some people don't post because they don't recognize the value of their own
contributions. They assume those with more experience/skills have better
ideas when that's often not true. As a result, only a handful of over 3000
list members post anything when you know many of the others have all kinds
of great ideas.

Regarding what questions and belong on the list and what doesn't,  I don't
think there's any risk of c4l getting flooded with irrelevant questions.
Most postings on this list seem to be about events, positions, reports,
awards, etc. that are of interest only to some and that's not a bad thing.
IMO, it would be a good thing to have more tech in the mix and more
diversity in the tech topics discussed.

Besides, a lot of the best stuff is learned by accident. That's hard to do
with tightly focused questions in tightly focused venues -- the problem at
hand gets solved, but broader implications and opportunities to apply the
ideas elsewhere may be missed.

kyle

Re: [CODE4LIB] [code4libcon] Proposed Duty Officer

2016-02-25 Thread Kyle Banerjee

On Wed, Feb 24, 2016 at 4:36 PM, Becky Yoose  wrote:

> Apologies for the short reply with my manager's hat firmly in place -
> transparency is good, but there are times when a particular process or
> discussion should not be public. Given the sensitive nature of some of the
> feedback that might be presented about particular individuals, transparency
> would not be a good fit for the feedback process.
>

For clarity, am I correct in understanding we are collecting feedback only
on those volunteering to become duty officers, and not on those who
compile/manage harassment information nor on those responsible for
determining what actions to take in response to incidents of harassment?

kyle

Re: [CODE4LIB] [code4libcon] Proposed Duty Officer

2016-02-24 Thread Kyle Banerjee

Fully agreed that anonymity is sometimes necessary to protect individuals.

My interpretation of the email I responded to was that the anonymous form
was for feedback for the idea of the proposed duty officers rather than the
suitability of particular individuals to fill this role.

My apologies to everyone if I have misunderstood.

If the idea is to collect feedback pertaining to specific individuals, I
believe it would have been more appropriate to collect anonymous feedback
that potentially included everyone (rather than a select few) so that
suitability concerns could be resolved before people put their name on a
volunteer list. As things are now, anyone on the duty officer list who
doesn't wind up serving for any reason might be wrongly assumed to have
been barred for being a harasser regardless of any public explanation.

I hope that the process for resolving accusations would be a matter of
public discussion.

kyle

On Wed, Feb 24, 2016 at 4:36 PM, Becky Yoose <b.yo...@gmail.com> wrote:

> Apologies for the short reply with my manager's hat firmly in place -
> transparency is good, but there are times when a particular process or
> discussion should not be public. Given the sensitive nature of some of the
> feedback that might be presented about particular individuals, transparency
> would not be a good fit for the feedback process.
>
> Thanks,
> Becky
>
> On Wed, Feb 24, 2016 at 4:28 PM, Eric Phetteplace <phett...@gmail.com>
> wrote:
>
> > I think we're all perfectly fine with discussing this issue in the open,
> by
> > all means let's do that. The Code of Conduct on GitHub is a shining
> example
> > of this; the whole discussion is in the open and you can see the
> > conversations around particular passages unfold in the issues queue. The
> > problem is discussing specific concerns one has with *individuals.* That
> > does not feel appropriate for a public listserv, whether we're talking
> > about a victim, harasser, or potential duty officer.
> >
> > Perhaps I'm misunderstanding, but I do not see how the inability to voice
> > concerns about individuals stops us from having a general conversation on
> > how to be an inclusive and safe community. Much as we can "improve
> > everyone's skills", as preconferences of the past have done, while *also*
> > having designated duty officers with a specific responsibility. These are
> > not mutually exclusive and indeed are complimentary.
> >
> > Best,
> > Eric
> >
> > On Wed, Feb 24, 2016 at 3:25 PM, Esmé Cowles <escow...@ticklefish.org>
> > wrote:
> >
> > > We live in a world where the are repercussions of calling out people
> for
> > > sexual harassment.  Not to put too fine a point on it, we live in a
> world
> > > where people were recently sued for doing just that.  So I think it's
> > > completely necessary to have an anonymous method of raising concerns,
> if
> > > you really want people to raise concerns with the conference
> organizers.
> > >
> > > -Esmé
> > >
> > > > On Feb 24, 2016, at 6:12 PM, Kyle Banerjee <kyle.baner...@gmail.com>
> > > wrote:
> > > >
> > > >> Feedback about proposed duty officers can be emailed to directly to
> > me,
> > > >> chadbnel...@gmail.com, or submitted via this anonymous form
> > > >> <http://goo.gl/forms/YKfWRwyiOr>.
> > > >>
> > > >
> > > >
> > > > It's unfortunate people feel a need to move discussions offline -- I
> > > > interpret this as meaning some people are afraid of repercussions for
> > > > respectfully sharing thoughts on an issue that affects everyone.
> > > >
> > > > I believe we agree as a community we cannot be our best if the ideas
> > and
> > > > talents of any group are excluded. I believe we agree specific
> measures
> > > are
> > > > needed to overcome structural barriers and provide opportunities to
> > broad
> > > > groups of people who still can't participate in the technology
> > community
> > > on
> > > > an equal basis.
> > > >
> > > > To be direct, I have concerns about the duty officer idea.  I support
> > the
> > > > motivation behind the concept 100%. I have great respect for the
> people
> > > who
> > > > have stepped up on this issue, both as technologists and as people in
> > > > general.
> > > >
> > > > Being a self selected group, c4l has problems found in society at
> > large.
> > > If
> > > >

Re: [CODE4LIB] [code4libcon] Proposed Duty Officer

2016-02-24 Thread Kyle Banerjee

> Feedback about proposed duty officers can be emailed to directly to me,
> chadbnel...@gmail.com, or submitted via this anonymous form
> .
>


It's unfortunate people feel a need to move discussions offline -- I
interpret this as meaning some people are afraid of repercussions for
respectfully sharing thoughts on an issue that affects everyone.

I believe we agree as a community we cannot be our best if the ideas and
talents of any group are excluded. I believe we agree specific measures are
needed to overcome structural barriers and provide opportunities to broad
groups of people who still can't participate in the technology community on
an equal basis.

To be direct, I have concerns about the duty officer idea.  I support the
motivation behind the concept 100%. I have great respect for the people who
have stepped up on this issue, both as technologists and as people in
general.

Being a self selected group, c4l has problems found in society at large. If
the conference is at least as safe as other environments attendees
encounter such as airports, streets, bars, and restaurants, I would hope
the conference organizers could address issues when self policing (i.e.
people looking out for each other) proved inadequate.

My concern is that while harassment and assault are real issues, they have
taken a life of their own and divert too much focus from helping people and
improving everyone's skills to protecting people from attack. I fear these
well meaning measures do not improve safety and possibly harden the few
miscreants they're intended to mitigate.

I hope my words will be perceived in the spirit intended.

kyle

Re: [CODE4LIB] Best way to handle non-US keyboard chars in URLs?

2016-02-21 Thread Kyle Banerjee

>
> > 3) Who type un-shortened URLs any more?
>
> I'm looking for responses that solve this rather than dismiss Intuitive
> URLs.
>

The question is what the use case you're trying to solve looks like. Is the
goal typability because it's hand transcribed from a business card, knowing
what the link connects to without following it, identification, search
engine optimization, something else, a combination of these things, etc?

Hand typed URLs that are not copied and pasted need to be short and not
look like gibberish. Transliteration schemes not known to people typing
them confuse rather than help. Whatever route you take, you'll be
optimizing one or more objectives at the expense of something else.

kyle

Re: [CODE4LIB] searching metadata vs searching content

2016-01-27 Thread Kyle Banerjee

A couple things come to mind. The first is that you'll need to experiment a
bit to get the behavior that works for your situation and your users --
common solutions that work great in other environments may not work for you.

The second is that if you haven't already, you should see if nested solr
documents aren't appropriate for your use case as that seems the right
structure for dealing with compound objects.

kyle

On Wed, Jan 27, 2016 at 11:28 AM, Laura Buchholz 
wrote:

> Thanks, Shaun--that overview was great. More info is better than less! I
> think this is the step that I want to know more about:
> In the case of a “compound object” you may need to have a script iterate
> over lots of separate content files and add them to the Solr document that
> represents a yearbook.
>
> Is is common to add all text content from the multi-page yearbook to one
> Solr field? So, the script would essentially extract and concatenate text
> from the multiple full-text files that the METS record points to and add it
> to one Solr field? That would make sense to me.
>
> However, when the user selects the item from a results set, they expect to
> be taken to the place within the item that contains their search term, or
> to not have to do much if any work to figure out where their search term
> is. At least, our users are accustomed to that behavior and expect the
> application to do that work for them. For example, a search for "ethel
> knotts" in OregonDigital
> <
> http://oregondigital.org/catalog/?utf8=%E2%9C%93_field=all_fields=ethel+knotts
> >
> gives
> some results, and a user can select the first item (Commencement Program,
> 1922) and can see the location pin for the file that contains their term. I
> thought some institutions automatically open the item to the first result,
> but now that I'm trying to find examples to cite, I'm not seeing that
> happen.
>
> Would this probably work by having the application do a second search
> (without the user needing to know) within the item after the user selects
> it? That search would be triggered by the IA bookreader, in the case of
> Oregon Digital, it seems. Or is something else happening? To get this
> functionality, the application would have to know which ranges of text
> belong to which files, and I'm curious about how that info would be stored
> and provided, whether in METS or Solr or something else.
>
> For better general context to these questions: I'm trying to understand how
> things are commonly done so I can better talk with our developer, who is in
> campus IT. We will be leaving ContentDM and going with a homegrown system
> that uses Solr among other components. We don't have any METS records, but
> when I think of structural metadata records, I think METS. If there's other
> ways of structuring metadata and content to provide the same functionality,
> that's good too.
>
> Thanks again for your help!
>
> On Tue, Jan 26, 2016 at 8:24 PM, Shaun D. Ellis 
> wrote:
>
> > Hi Laura,
> > Great question.  Unfortunately, I think you’re going to be fairly limited
> > when it comes to having granular control over fields and facet indexing
> in
> > ContentDM (someone correct me if I’m wrong).
> >
> > But to answer your question about general steps involved with indexing
> the
> > metadata AND full text of a METS document…
> >
> > To have the most control over how your data is indexed, you will want to
> > use a search platform.  Apache Solr is
> > used in a majority of library-related software, so I’ll use that in my
> > examples, although there are several others.  Solr doesn’t have a concept
> > of “metadata” and “content”, just “fields" that you can use to search
> both.
> >
> > In the case of your METS data, you will need to first transform it into a
> > more simplified document (Solr XML) containing the fields that matter
> for a
> > particular search interface and are defined in the schema<
> > https://wiki.apache.org/solr/SchemaXml>.  This transform step can be
> done
> > in any number of ways, but XSLT is fairly common.  To index the full-text
> > content that your METS document points to, you can build that into your
> > transform script/stylesheet, or you can run a separate script/process
> later
> > that updates the record with the full-text.  In the case of a “compound
> > object” you may need to have a script iterate over lots of separate
> content
> > files and add them to the Solr document that represents a yearbook.
> >
> > There are a few ways to add data to a solr index, but a common one in
> > library-land is to add (and update) records to the Solr index by POSTing
> > your freshly “transformed" data via HTTP (here’s the Solr quickstart
> > tutorial).
> >
> > Customizing your search results (weighting, stemming, rows per page,
> etc.)
> > can be handled in the Solr config file<
> >

Re: [CODE4LIB] oclc member code

2016-01-21 Thread Kyle Banerjee

Try something like this:

http://www.worldcat.org/webservices/registry/lookup/Institutions/oclcSymbol/OHS?serviceLabel=enhancedContent

Seems to me I messed with this sort of info some years back in an effort to
gather info about libraries in my consortium and found so much
redundant/outdated info that I wound up resorting to other methods to get
what I needed.

kyle

On Thu, Jan 21, 2016 at 5:57 AM, Eric Lease Morgan  wrote:

> Given an OCLC member code, such as BXM for Boston College, is it possible
> to use some sort of OCLC API to search WorldCat (or some other database)
> and return information about Boston College? —Eric Lease Morgan
>

Re: [CODE4LIB] Anyone familiar with XSLT? Im stuck

2016-01-21 Thread Kyle Banerjee

>
> For simple situations one might do without XSLT and stuff
> XPath expressions for the content to grab into the command
> line of utilities like xml_grep or xpath.


In many cases, it's even easier to use string utilities, particularly if
there's any chance the XML is not totally valid.

If you're handy with vi, that's another option that would let you do this
kind of task in less than a minute without the need to write a program.

kyle

Re: [CODE4LIB] Creating/maintaining metadata for intangible concepts

2016-01-08 Thread Kyle Banerjee

Hi Laura,

You have the idea. There are a number of access points we'd like humans to
add based on space/time/location/use/visual elements in the photos
unrelated to the actual subject matter. There are a variety of approaches
that could be taken, and I've received helpful ideas offline on how to
proceed.

I'm not a fan of CYA policies, but I'm averse to adding the sort of tags
you removed because I cannot imagine how such tags wouldn't put our library
and institution in a very bad light while undermining organizational
priorities. However, the need for providing this sort of access is real so
we need to do something. The suggestions I received are mostly based on
restricting or obfuscating some metadata, and a solution along those lines
will probably be the ticket.

One particular idea that intrigued me was classification codes in a
specialized field. This provides a lot of display and search options as
what is displayed can be very different than what is stored/searched --
i.e. people could search in plain English and the results would appear
without it being obvious why the search works (and hopefully they wouldn't
wonder). The other thing I like about it is it's easy to eliminate once the
need for it disappears.

Less sensitive stuff is more straightforward. My gut reaction is that
regular tags stored as multiword non-tokenized strings (to prevent
pollution of search results and the subject index) might be a good
approach. But since many libraries have needs similar to ours, I thought
I'd ask as I'm sure a lot of people have given this issue more thought than
I have.

kyle

On Fri, Jan 8, 2016 at 8:44 AM, Laura Buchholz 
wrote:

> Kyle, I don't know if I'm understanding your question correctly, but I
> think this is something I was just reviewing. I removed "Diversity" as a
> subject term (we're a little loose here in applying subjects terms that
> aren't directly in the photo) from some photos that were of, for example, a
> single student studying on the lawn or in the coffee shop. The diversity in
> the photo was that the student was of color. When there is an image of a
> white student, we wouldn't put "homogeneity" or something like that, so I
> took off "diversity". But, as you say, users do want to be able to search
> for these concepts, and I think it is important not to erase differences
> just because it is difficult to represent that without being essentialist
> in metadata.
>
Are you trying to automate this process, or are humans doing this? If
> automated, watch out for what Google Photos did:
>
> http://www.usatoday.com/story/tech/2015/07/01/google-apologizes-after-photos-identify-black-people-as-gorillas/29567465/
>
>

[CODE4LIB] Creating/maintaining metadata for intangible concepts

2016-01-07 Thread Kyle Banerjee

We are looking for ideas to help users search our collections for photos
based on concepts (e.g. diversity) rather than the subject matter depicted
in the photo. Since the high priority institutional objectives are often
behind requests for these items, we'd really like a better solution than
telling them to go fishing.

Concepts that are not part of the subject matter are subjective and
contextual by nature, but we can identify photos that satisfy some of the
more common requests. However, we have not come up with a way to add
appropriate access points -- methods that come to mind would/should get
anyone involved tarred, feathered, and run out of town on a rail.

Parenthetical qualifiers work reasonably well for some concepts and visual
elements not directly related to the actual subject matter in a photo. But
this method doesn't work for sensitive topics.

Has anyone come up with a good way to provide this sort of access? Thanks,

kyle

Re: [CODE4LIB] Marc record creation and matching

2015-10-28 Thread Kyle Banerjee

On Wed, Oct 28, 2015 at 6:03 PM, Terry Reese  wrote:

> Honestly -- if this was me and I didn't have load table training (even if I
> did) -- I would export the MARC records from my III system that I wanted to
> overlay.  I would create MARC records from the Excel sheets -- then I would
> use a tool to merge the data between the generated records and the source
> records.  You can again, do this via a script -- or likely with MarcEdit's
> Merge Records tool.  Then, I would reload the records back into III using
> the 949 -- overlaying on the bib number.  This of course overwrites the
> records in your catalog -- but that should be ok since you are using the
> records from your catalog as your source.
>

I favor the method Terry suggests. I haven't messed with a load table in
longer than I should admit but unless I'm forgetting something (and someone
with more recent knowledge should correct me), the addition of the note
can't be done with a "match and attach" table. This means you'd need to do
funky things with field protection to use load tables to handle the
matching.

You have to create MARC records from the spreadsheet anyway. Much easier to
merge them with a MARC dump from your system, add the notes, and overlay.
If you're worried, you can manually check the file before loading it. BTW,
I've overlayed hundreds of thousands of records in a single operation and
the process is quite safe. This will tweak the mod dates, so you'll want to
separate out records that didn't match if that's an issue.

kyle

>
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Stephen Grasso
> Sent: Wednesday, October 28, 2015 8:16 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Marc record creation and matching
>
> Greetings all,
>
> My colleague and I want to create MARC records from a spreadsheet and then
> import those MARC records into our library system (Millennium) . We want
> those records we have imported to match on ISBN. We want to keep the
> integrity of the data in the catalogue (we don't want  the newly created
> records to overlay material currently in the catalogue) and we want, in the
> same process to be able to insert a note in the records in the catalogue
> that have been matched with our records created from the spreadsheet.
>
> All ideas will be gratefully received,
>
> Kind regards,
>
>
> Steve Grasso
>
> Resource Librarian
> Library Resource Services
> Queensland University of Technology Library Kelvin Grove Campus | Level 1,
> D
> Block | Victoria Park Road | Kelvin Grove QLD 4059 AUSTRALIA
> t: + 61 7 3138 5574 |f: +61 7 3138 3994 |e
> s.gra...@qut.edu.au
>
> CRICOS No 00213J
>

[CODE4LIB] Video playback issues

2015-09-18 Thread Kyle Banerjee

Howdy all,

A number of researchers at our institution use devices that take time
sequence photos and transmit the images to software that converts these to
AVI. In general, it's pretty straightforward. However, we are encountering
cases where the AVI files created on Macs don't play properly in VLC player
-- only a single frame displays while the progress bar moves. They work
fine in Windows Media Player.

I tried converting the files using the syntax:

 avconv -i test.avi -c copy test.mp4


which resulted in a file that worked great in VLC, but then Windows Media
Player complains that it can't support the file type or might not support
the codec. I then tried:

avconv -i test.avi -c:v libx264 -c:a copy test.mp4


and wound up with the original problem (works fine in Windows Media Player,
but not in VLC). If I upload one of these files to youtube and download the
mp4, it works fine everywhere.

What am I missing? Our researchers generate a lot of these files, so this
would be a good problem to solve. Thanks,

kyle

Re: [CODE4LIB] "coders for libraries"

2015-09-01 Thread Kyle Banerjee

I'm a little surprised that on a list populated by metadata geeks, no one has 
suggested that just the title (i.e. code4lib) be in the title element ;-)



—
Sent from Mailbox

On Tue, Sep 1, 2015 at 4:51 PM, Tom Cramer  wrote:

> You can tell it’s a public library because you get a 404. At a private 
> library, you’d get a 403. 
> - Tom
>> On Sep 1, 2015, at 2:07 PM, Michael J. Giarlo  
>> wrote:
>> 
>> DPLA is the finest library of 404s I've seen.
>> 
>> On Tue, Sep 1, 2015 at 2:00 PM Tom Johnson 
>> wrote:
>> 
>>> Eric, your suggestion simply won't do:
>>> http://dp.la/item/e637ec0731c3129dc4f6ff4c5e528bda is a 404.
>>> 
>>> - Tom
>>> 
>>> On Tue, Sep 1, 2015 at 1:07 PM, Eric Phetteplace 
>>> wrote:
>>> 
 "code4lib | e637ec0731c3129dc4f6ff4c5e528bda"
 
 In all seriousness, I think coming up with an inclusive tagline is a
>>> great
 idea. How about "people, libraries, code"?
 
 On Tue, Sep 1, 2015 at 12:25 PM Laura Smart 
 wrote:
 
> Rotating slogans FTW.
> Laura
> 
> On Tue, Sep 1, 2015 at 12:03 PM, Sarah Shealy <
>>> sarah.she...@outlook.com>
> wrote:
> 
>> +1 to both
>> 
>>> Date: Tue, 1 Sep 2015 11:58:39 -0700
>>> From: dei...@uw.edu
>>> Subject: Re: [CODE4LIB] "coders for libraries"
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> 
>>> Code4Lib | Libers for Codaries
>>> 
>>> 
>>> Kate Deibel, PhD | Web Applications Specialist
>>> Information Technology Services
>>> University of Washington Libraries
>>> http://staff.washington.edu/deibel
>>> 
>>> --
>>> 
>>> "When Thor shows up, it's always deus ex machina."
>>> 
>>> On 9/1/2015 11:39 AM, scott bacon wrote:
 Code4Lib | We Are The Wind Beneath Your Wings
 
 On Tue, Sep 1, 2015 at 2:31 PM, Wilhelmina Randtke <
> rand...@gmail.com>
 wrote:
 
> In general, it's not great to refer to people as nouns.  It's
 better
>> to say
> people with an adjective, so the person isn't replaced or given
 just
>> one
> identity.  I support not calling people coders or other noun.
> 
> -Wilhelmina Randtke
> 
> On Tue, Sep 1, 2015 at 9:42 AM, Eric Hellman 
>> wrote:
> 
>> Between September and November of 2008, the title attribute of
 the
>> Code4lib homepage was changed from "code4lib | Code for
 Libraries"
> to
>> "code4lib | coders for libraries, libraries for coders".
>> 
>> Dave Winer, who could be considered the inventor of the blog,
>> recently
>> tweeted about us:
>> 
>> "code4lib: coders for libraries, libraries for coders. (I
>>> really
>> hate the
>> word "coders.") code4lib.org "
>> 
>> As someone who feels that Code4Lib should welcome people who
 don't
>> particularly identify as "coders", I would welcome a return to
 the
> previous
>> title attribute.
>> 
>> Eric Hellman
>> President, Free Ebook Foundation
>> Founder, Unglue.it https://unglue.it/
>> http://go-to-hellman.blogspot.com/
>> twitter: @gluejar
>> 
> 
>> 
> 
 
>>>

Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Kyle Banerjee

Information in subfield u should be complete, but even if that weren't the
case, it's important to consider how systems handle the information they're
given. MARC is just a container, and just because the information is
syntactically kosher does not mean it will be processed how you like.

In the case at hand,  you can do anything you like if the information is
just used locally and your systems behaves the way you need. As Andrew
mentions, you'll run into trouble if this information gets imported into
other systems.

kyle


On Mon, Aug 17, 2015 at 1:41 PM, Stuart A. Yeates syea...@gmail.com wrote:

 I'm in the middle of some work which includes touching the 856s in lots of
 MARC records pointing to websites we control. The websites are available on
 both https://example.org/ and http://example.org/

 Can I put //example.org/ in the MARC or is this contrary to the standard?

 Note that there is a separate question about whether various software
 systems support this, but that's entirely secondary to the question of the
 standard.

 cheers
 stuart
 --
 ...let us be heard from red core to black sky

Re: [CODE4LIB] Processing Circ data

2015-08-06 Thread Kyle Banerjee

On Wed, Aug 5, 2015 at 1:07 PM, Harper, Cynthia char...@vts.edu wrote:

 Hi all. What are you using to process circ data for ad-hoc queries.  I
 usually extract csv or tab-delimited files - one row per item record, with
 identifying bib record data, then total checkouts over the given time
 period(s).  I have been importing these into Access then grouping them by
 bib record. I think that I've reached the limits of scalability for Access
 for this project now, with 250,000 item records.  Does anyone do this in
 R?  My other go-to- software for data processing is RapidMiner free
 version.  Or do you just use MySQL or other SQL database?  I was looking
 into doing it in R with RSQLite (just read about this and sqldf
 http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure
 my IT department will be skeptical of letting me have MySQL on my desktop.
 (I've moved into a much more users-don't-do-real-computing kind of
 environment).  I'm rusty enough in R that if anyone will give me some
 start-off data import code, that would be great.


As has been mentioned already, it's worth investigating whether OpenRefine
or sqllite are options for you. If not, I'd be inclined to explore
solutions that don't rely on your local IT dept.

It's so easy to spend far more time going through approval, procurement,
and then negotiating local IT security/policies than actually working that
it pays to do a lot of things on the cloud. There are many services out
there, but I like Amazon for occasional need things because you can
provision anything you want in minutes and they're stupid cheap. If all you
need is mysql for a few minutes now and then, just pay for Relational
Database Services. If you'd rather have a server and run mysql off it, get
an EBS backed EC2 instance (the reason to go this route rather than
instance store is improved IO and your data is all retained if you shut off
the server without taking a snapshot). Depending on your usage, bills of
less than a buck a month are very doable. If you need something that runs
24x7, other routes will probably be more attractive. Another option is to
try the mysql built into cheapo web hosting accounts like bluehost, though
you might find that your disk IO gets you throttled. But it might be worth
a shot.

If doing this work on your desktop is acceptable (i.e. other people don't
need access to this service), you might seriously consider just doing it on
a personal laptop that you can install anything you want on. In addition to
mysql, you can also install VirtualBox which is a great environment for
provisioning servers that you can export to other environments or even
carry around on your cell phone.

With regards to some of the specific issues you bring up, 40 minutes for a
query on a database that size is insane which indicates the tool you have
is not up for the job. Because of the way databases store info, performance
degrades on a logarthmic (rather than linear) basis on indexed data. In
plain English, this means even queries on millions of records take
surprisingly little power. Based on what you've described, changing a field
from variable to fixed might not save you any space and could even increase
it depending on what you have. In any case, the difference won't be worth
worrying about.

Whatever solution you go with, I'd recommend learning to provision yourself
resources when you can find some time. Work is hard enough when you can't
get the resources you need. When you can simply assign them to yourself,
the tools you need are always at hand so life gets much easier and more fun.

kyle

Re: [CODE4LIB] Looking for Ideas on Line Breaks in OCR Text

2015-08-04 Thread Kyle Banerjee

On Tue, Aug 4, 2015 at 6:09 AM, Matt Sherman matt.r.sher...@gmail.com
wrote:

 I am on Windows machines, so I don't have quite the easy access to
 that useful command.  Someone had earlier put the OCR in a doc file so
 I've been playing with that more than with the raw PDF OCR.


Versions of the unix utilities that run on Windows are available, but you
can just use Microsoft Word to do what you want. Just use the find/replace
function. In Word, you can search for a paragraph marker by looking for
^p (caret p)

Because you undoubtedly have real paragraphs in the document which you
don't want to remove, I'd recommend substituting double paragraph marks
with something unique (e.g. @ZZZ@) before replacing all the other
paragraph marks with a space. Then replace your unique marker with a
paragraph.

HTH,

kyle

Re: [CODE4LIB] Regex Question

2015-07-07 Thread Kyle Banerjee

Y'all are doing this the hard way. Word allows regex replacements as well
as format based criteria.

For this particular use case:

   1. Open the find/replace dialog (CTL+H)
   2. In the Find what box, put (*) -- make sure the option for Use
   Wildcards is selected, and for the format, specify italic
   3. For theReplace box, just put \1 and specify All caps

And you're done

kyle

On Tue, Jul 7, 2015 at 9:32 AM, Thomas Krichel kric...@openlib.org wrote:

   Eric Phetteplace writes

  You can match a string of all caps letters like [A-Z]

   This works if you are limited to English. But in a multilingual
   setting, you need to watch out for other uppercases, such as
   крихель vs КРИХЕЛЬ. It then depends in the unicode implementation
   of your regex application. In Perl, for example, you would use
   [[:upper:]].


 --

   Cheers,

   Thomas Krichel  http://openlib.org/home/krichel
   skype:thomaskrichel

Re: [CODE4LIB] Regex Question

2015-07-07 Thread Kyle Banerjee

For clarity, Word does regex, not just wildcards.  It's not quite as
complete as what you'd get with some other environments such as OpenOffice
Writer since matching is lazy rather than greedy which can be a big deal
depending on what you're doing and there are a couple other catches --
notably no support for | -- but it's reasonably powerful. There is no
regexp capability in Excel unless you're willing to use VBA.

kyle

On Tue, Jul 7, 2015 at 1:10 PM, Gordon, Bonnie bgor...@rockarch.org wrote:

 OpenOffice Writer (or a similar program) may be useful for this. It would
 allow you to search by format while using a more controlled regular
 expression than MS Word's wildcards.

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Matt Sherman
 Sent: Tuesday, July 07, 2015 12:45 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Regex Question

 Thanks everyone, this really helps.  I'll have to work out the italicized
 stuff, but this gets me much closer.

 On Tue, Jul 7, 2015 at 12:43 PM, Kyle Banerjee kyle.baner...@gmail.com
 wrote:

  Y'all are doing this the hard way. Word allows regex replacements as
  well as format based criteria.
 
  For this particular use case:
 
 1. Open the find/replace dialog (CTL+H)
 2. In the Find what box, put (*) -- make sure the option for Use
 Wildcards is selected, and for the format, specify italic
 3. For theReplace box, just put \1 and specify All caps
 
  And you're done
 
  kyle
 
  On Tue, Jul 7, 2015 at 9:32 AM, Thomas Krichel kric...@openlib.org
  wrote:
 
 Eric Phetteplace writes
  
You can match a string of all caps letters like [A-Z]
  
 This works if you are limited to English. But in a multilingual
 setting, you need to watch out for other uppercases, such as
 крихель vs КРИХЕЛЬ. It then depends in the unicode implementation
 of your regex application. In Perl, for example, you would use
 [[:upper:]].
  
  
   --
  
 Cheers,
  
 Thomas Krichel  http://openlib.org/home/krichel
 skype:thomaskrichel

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Kyle Banerjee

How you want to preprocess and structure the data depends on what you hope
to achieve. Can you say more about what you want the end product to look
like?

kyle

On Thu, Jun 18, 2015 at 10:08 AM, Matt Sherman matt.r.sher...@gmail.com
wrote:

 That is a pretty good summation of it yes.  I appreciate the suggestions,
 this is a bit of a new realm for me and while I know what I want it to do
 and the structure I want to put it in, the conversion process has been
 eluding me so thanks for giving me some tools to look into.

 On Thu, Jun 18, 2015 at 1:04 PM, Eric Lease Morgan emor...@nd.edu wrote:

  On Jun 18, 2015, at 12:02 PM, Matt Sherman matt.r.sher...@gmail.com
  wrote:
 
   I am working with colleague on a side project which involves some
 scanned
   bibliographies and making them more web
 searchable/sortable/browse-able.
   While I am quite familiar with the metadata and organization aspects we
   need, but I am at a bit of a loss on how to automate the process of
  putting
   the bibliography in a more structured format so that we can avoid going
   through hundreds of pages by hand.  I am pretty sure regular
 expressions
   are needed, but I have not had an instance where I need to automate
   extracting data from one file type (PDF OCR or text extracted to Word
  doc)
   and place it into another (either a database or an XML file) with some
   enrichment.  I would appreciate any suggestions for approaches or tools
  to
   look into.  Thanks for any help/thoughts people can give.
 
 
  If I understand your question correctly, then you have two problems to
  address: 1) converting PDF, Word, etc. files into plain text, and 2)
  marking up the result (which is a bibliography) into structure data.
  Correct?
 
  If so, then if your PDF documents have already been OCRed, or if you have
  other files, then you can probably feed them to TIKA to quickly and
 easily
  extract the underlying plain text. [1] I wrote a brain-dead shell script
 to
  run TIKA in server mode and then convert Word (.docx) files. [2]
 
  When it comes to marking up the result into structured data, well, good
  luck. I think such an application is something Library Land sought for a
  long time. “Can you say Holy Grail?
 
  [1] Tika - https://tika.apache.org
  [2] brain-dead script -
  https://gist.github.com/ericleasemorgan/c4e34ffad96c0221f1ff
 
  —
  Eric

Re: [CODE4LIB] LC Cutter Generator - does this exist?

2015-05-12 Thread Kyle Banerjee

There's one built into the Cataloging Calculator. It's a javascript program
I wrote 18 years ago for Netscape 4.0, but it still works and gets
significant use.

Since you're working server side, you'll probably just want to copy the
method used rather than to use the code outright though anyone is welcome
to use this embarrassingly written stuff as they see fit. The cutter()
function would be easy to adopt into any language and is only a couple
dozen lines long. It can be found in the
http://calculate.alptown.com/calculate.js file

kyle

On Tue, May 12, 2015 at 1:37 PM, Justin Rittenhouse jritt...@nd.edu wrote:

 This seems like a fairly straightforward thing to write...so it seems like
 someone else would already have done so.  That said, my Google-fu hasn't
 come across anything yet.  So...does anyone have a cutter generator that
 they're willing to share?  Language doesn't matter, although most of our
 stuff is Perl or Ruby, so if it's one of those that's a plus.

 Thanks!
 Justin

 --
 *Justin Rittenhouse*
 *Sr. Application Development Technician, Web and Software Engineering*
 *Hesburgh Libraries*

Re: [CODE4LIB] How to measure quality of a record

2015-05-06 Thread Kyle Banerjee

 On May 6, 2015, at 7:08 AM, James Morley james.mor...@europeana.eu wrote:
 
 I think a key thing is to determine to what extent any definition of 
 'completeness' is actually a representation of 'quality'.  As Peter says, 
 making sure not just that metadata is present but then checking it conforms 
 with rules is a big step towards this. 

This. 

Basing quality measures too much on the presence of certain data points or the 
volume of data is fraught with peril. In experiments in the distant past, my 
experience was that looking for structure and syntax patterns that indicate 
good/bad quality as well as considering record sources was useful. Also keep in 
mind that any scoring system is to some extent arbitrary, so you don't want to 
read more into what it generates than appropriate.

Kyle

Re: [CODE4LIB] Mac OS 9 emulator

2015-04-23 Thread Kyle Banerjee

On Thu, Apr 23, 2015 at 10:20 AM, Schmitz Fuhrig, Lynda 
schmitzfuhr...@si.edu wrote:

 Thanks for the responses.

 We actually need to read media within it so Virtual Box would not work for
 us.


Could you say a bit more about your use case? Some applications such as
dealing with archival materials might actually require actual hardware in
which case ebay may be the best option.

kyle

Re: [CODE4LIB] Recommendations for places to advertise for a library systems guru?

2015-04-22 Thread Kyle Banerjee

On Wed, Apr 22, 2015 at 7:18 AM, Jack Hill jackh...@duke.edu wrote:



 I would also look at advertising through local technical user groups or
 meetings that touch on topics related to the job.


This. Also might not hurt to consider LinkedIn -- results from there can
surprise you.

Whatever you do, all committee members should use their professional
networks to identify and contact good candidates directly  -- some of the
best candidates might not realize they're in the market until someone talks
to them. The other advantage of identifying people through networks is you
typically know a lot more about what drives them and what they offer (and
by extension their prospects for success).

kyle

Re: [CODE4LIB] DSpace/Eprints vs Fedora

2015-04-10 Thread Kyle Banerjee


 If your discovery strategy is
 predicated on having your scholarly IR harvested and presented to the world
 through a separate discovery tool and the vast bulk of your document views
 are coming from Google and Google Scholar users, does this lessen the
 'compelling experience' requirement?...


Not at all. The reality is that the vast majority of users are not going to
start with a library portal, so no matter what you do, it needs to play
well with Google.

For singular items like individual documents and images, making things work
with different solutions isn't too bad. However, as the use case turns to
identifying all photos that satisfy a particular need, slicing and dicing a
dataset consisting of thousands of large files in a complex hierarchy,
exploring a large format book, relating audio/textual/visual archival
resources, etc, there are issues with getting the stuff efficiently into
the repository with appropriate metadata, how to maintain what's in the
repository (which includes tasks such as batch assignment of metadata or
updating items) in addition requirements the users may have with regards to
navigating or interacting with the items.

kyle

Re: [CODE4LIB] Amazon Glacier - tracking deposits

2015-04-09 Thread Kyle Banerjee

Howdy Sara,

I've played around a bit with Glacier. It's a bit weird to work with, but
tools keep on improving.

The real question is what you hope to accomplish with it. As its name
implies, it's designed for stuff that is basically frozen. When you take
things out, you need to do so very slowly. The pricing model is such that
if you try to pull out stuff quickly (e.g. you're trying to restore a
system), the cost goes into the stratosphere -- definitely model what
things would look like before using it for purposes like backup.

However, if you have access images that are already backed up on disk or
tape offsite (i.e. system recovery needs already taken care of) and this is
just for storage of high res scans, Glacier could be a good way to go.

As far as the ID's go, I'd embed them directly into the access image
metadata. That way, it's impossible to lose the connection between the
image and the master. You can keep it elsewhere as well, but embedded
metadata is a great place to store critical identifiers.

kyle

On Wed, Apr 8, 2015 at 3:32 PM, Sara Amato sam...@willamette.edu wrote:

 Has anyone leapt on board with Glacier?   We are considering using it for
 long term storage of high res archival scans.  We have derivative copies
 for dissemination, so don’t intend touching these often, if ever.   The
 question I have is how to best track the  Archive ID that glacier attaches
 to deposits, as it looks like that is the only way to retrieve information
 if needed (though you can attach a brief description also that appears on
 the inventory along with the id.)   We’re considering putting the ID in
 Archivist Toolkit, where the location of the dissemination copies is noted,
 but am wondering if there are other tools out there specific for this
 scenario that people are using.

Re: [CODE4LIB] talking about digital collections vs electronic resources

2015-03-19 Thread Kyle Banerjee

On Wed, Mar 18, 2015 at 9:51 AM, Laura Krier laura.kr...@gmail.com wrote:

 I think too often we present our collections to students through the
 framework of our own workflows and functional handling of materials


This.

We also try too hard to convey distinctions that aren't important to users
for the sake of technical accuracy. As a result, we sometimes introduce
problems that are worse than what we were trying to solve in first place.

There is also the issue that many people find library materials through
mechanisms other than the library provided silos -- particularly networked
resources. In reality, significant percentage of these users don't even
realize they're using the library.

kyle

[CODE4LIB] Job: Technology Director, Oregon Health Science University

2015-03-09 Thread Kyle Banerjee

Oregon Health  Science University (OHSU) Library in Portland seeks a
creative, dynamic, and innovative Technology Director.

OHSU is the state's only comprehensive academic health center and is made
up of the Schools of Dentistry, Medicine, and Nursing; College of Pharmacy;
numerous centers and institutes; OHSU Healthcare; and related programs. The
OHSU Library, the largest health sciences library in Oregon, serves the
faculty, staff, and students of OHSU, as well as health professionals and
residents of the State of Oregon. Library staff provide services in support
of teaching, research, patient care, and outreach. An active participant in
the Orbis Cascade Alliance, the Library is implementing the consortium’s
ambitious strategic agenda to push boundaries, change the landscape, and
inspire the profession.

Digital initiatives are a major priority of the Library, which is migrating
most library systems to cloud-based solutions. The Technology Director will
represent the Library in campus and regional partnerships, manage and
support technology projects, and directly implement solutions. As the lead
of a small team that collaborates with many partners, the Technology
Director’s duties will range from liaising with stakeholders to coding new
search applications, from managing staff to integrating content with linked
data and semantic technologies. This new position reflects the Library’s
investment in developing and using technology to deliver services,
facilitate research, and improve education.

Current initiatives in which the Technology Director could play a role
include development of semantic technologies for rare disease diagnostics;
development of search tools to query local and external data stores for
translational research; digital asset management in support of research,
teaching, archives, and strategic communications; delivery of knowledge
management tools in electronic health records for clinical education and
patient care; publication of archival public health data for use in modern
field research; and data management services to facilitate sharing and
reproducibility.

Position Description:

Reporting to the University Librarian, the Technology Director provides
leadership, vision, and management for the Library’s digital initiatives.
This position leads technical development efforts including integration of
systems, development of new applications, and implementations to support
infrastructure, software, and services. The Technology Director extends
Library technologies in joint projects with campus (e.g. Information
Technology Group, Teaching and Learning Center) and regional partners (e.g.
Orbis Cascade Alliance). As a member of the library leadership team, this
position takes an active part in strategic planning; sets goals and
objectives; serves on the Library Council that includes representatives
from management, professional, and classified staff; supervises the Digital
Collections and Metadata Librarian, two Systems/Applications Analysts, the
Web Manager, and a Library Technician; and collaborates with Library,
campus, community, and regional partners on technology solutions to serve
the OHSU education, research, and clinical communities, and residents of
the State of Oregon. As a member of the Library Faculty, the Technology
Director participates in planning, policy formation, and decision-making
relating to health sciences services, collections, and technologies. This
position requires scholarship and service that contributes to the
effectiveness of the Library, the University, and the profession.

Required Qualifications:

• Accredited graduate degree in an appropriate discipline (e.g. library and
information science, computer science, or research science);
• Five years of professional experience in an academic or health sciences
setting;
• Significant supervisory experience that promotes teamwork and
collaboration with library, campus, or consortial partners;
• Demonstrated success in mentoring, developing, and empowering staff with
a collaborative and open approach;
• Positive leadership style and ability to thrive in a fast-paced
environment;
• Evidence of initiative and flexibility;
• Significant practical experience with software project management, issue
tracking, and version control in a team based environment;
• A solid understanding of metadata strategies and data representation, and
their application in health sciences and libraries;
• Ability to determine requirements and develop specifications for data and
information-driven systems;
• Experience with current and emerging data architectures and technologies
to develop new and leverage legacy data services and applications.
• Proficiency with programmatic submission and retrieval of data from
repositories;
• Strong programming skills with a solid understanding of object oriented
languages and principles;
• Demonstrated ability to manage expectations and priorities diplomatically
among various stakeholders;
• History

Re: [CODE4LIB] Code4lib 2016 - tracks

2015-02-25 Thread Kyle Banerjee

On Mon, Feb 23, 2015 at 5:10 PM, Cary Gordon listu...@chillco.com wrote:

 If Code4LibCon changes, I will be disappointed, but I will still go.


I think it's changed a great deal over the years. But all things must
evolve to stay relevant.

I do think it would be a shame if the content and dynamics at c4l became
the same as the other conferences out there. Nowadays, all library
conferences include tech content, some of it quite decent.

kyle

Re: [CODE4LIB] examples of displays for compound objects and metadata

2015-01-28 Thread Kyle Banerjee

The best way to display compound objects really depends on the nature of
the compound objects. For example, the optimal display for a book stored as
a compound object will be very different than an art object taken from
various vantage points or a dataset. Likewise, whether you can get away
with not creating/displaying metadata for components of compound objects
depends on the use case. If you could say a bit more about what kind of
compound objects you have and what system(s) you are migrating to, people
could probably give you better advice.

kyle


On Wed, Jan 28, 2015 at 1:43 PM, Laura Buchholz laura.buchh...@reed.edu
wrote:

 We're migrating from CONTENTdm and trying to figure out how to display
 compound objects (or the things formerly known as compound objects) and
 metadata for the end user. Can anyone point me to really good examples of
 displaying items like this, especially where the user can see metadata for
 parts of the whole? I'm looking more for examples of the layout of all the
 different components on the page (or pages) rather than specific image
 viewers. Our new system is homegrown, so we have a lot of flexibility in
 deciding where things go.

 We essentially have:
 -the physical item (multiple files per item of images of text, plain
 text, pdf)
 -metadata about the item
 -possibly metadata about a part of the item (think title/author/subjects
 for a newspaper article within the whole newspaper issue), of which the
 titles might be used for navigation through the whole item.

 I think Hathi Trust has a good example of all these components coming
 together (except viewing non-title metadata for parts), and I'm curious if
 there are others. Or do most places just skip creating/displaying any kind
 of metadata for the parts of the whole?

 Thanks for any help!

 --
 Laura Buchholz
 Digital Assets Specialist
 Reed College
 503-517-7629
 laura.buchh...@reed.edu

Re: [CODE4LIB] Conference photography policy

2015-01-26 Thread Kyle Banerjee

On Mon, Jan 26, 2015 at 6:58 AM, Galen Charlton g...@esilibrary.com wrote:

 I would like to propose that C4L adopt a policy requiring that consent
 be explicitly given to be photographed or recorded, along the lines of
 a policy adopted by the Evergreen Project. [1]


As a practical matter, this is functionally equivalent to prohibiting
photography except for arranged photos which will need something simple
(like pictures of  cameras and mikes with slashes through them posted
throughout the venue) to communicate the policy. Differential badges,
lanyards, etc will not always be visible, and not all people will notice
them, be aware of what they mean, or can be assumed to be familiar with a
written policy. On an aside note, a lot of activity occurs outside the
official venues and it is in these areas where people might be most
vulnerable to unwanted photos.

kyle

Re: [CODE4LIB] Checksums for objects and not embedded metadata

2015-01-25 Thread Kyle Banerjee

On Sat, Jan 24, 2015 at 11:07 AM, Rosalyn Metz rosalynm...@gmail.com
wrote:


- How is your content packaged?
- Are you talking about the SIPs or the AIPs or both?
- Is your content in an instance of Fedora, a unix file structure, or
something else?
- Are you generating checksums on the whole package, parts of it, both?


The quick answer to this is that this is a low tech operation. We're
currently on regular filesystems where we are limited to feeding md5
checksums into a list. I'm looking for a low tech way that makes it easier
to keep track of resources across a variety of platforms in a decentralized
environment and which will easily adopt to future technology transitions.
For example, we have a bunch of stuff in Bepress and Omeka. Neither of
those is good for preservation, so authoritative files live elsewhere as do
a huge number of resources that aren't in these platforms. Filenames are
terrible identifiers and things get moved around even if people don't mess
with the files.

We also are trying to come up with something that deals with different
kinds of datasets (we're focusing on bioimaging at the moment) and fits in
the workflow of campus units, each of which needs to manage tens of
thousands of files with very little metadata on regular filesystems. Some
of the resources are enormous in terms of size or number of members.

Simply embedding an identifier in the file is a really easy way to tell
which files have metadata and which metadata is there. In the case at hand,
I could just do that and generate new checksums. But I think the generic
problem of making better use of embedded metadata is an interesting one as
it can make objects more usable and understandable once they're removed.
For example, just this past Friday I received a request to use an image
someone downloaded for a book. Unfortunately, he just emailed me a copy of
the image, described what he wanted to do, and asked for permission but he
couldn't replicate how he found it. An identifier would have been handy as
would have been embedded rights info as this is not the same for all of our
images. The reason we're using DOI's is that they work well for anything
and can easily be recognized by syntax wherever they may appear.

On Sat, Jan 24, 2015 at 7:06 PM, Joe Hourcle onei...@grace.nascom.nasa.gov
 wrote:


 The problems with 'metadata' in a lot of file formats is that they're
 just arbitrary segments -- you'd have to have a program that knew
 which segments were considered 'headers' vs. not.  It might be easier
 to have it be able to compute a separate checksum for each segment,
 so that should the modifications change their order, they'd still
 be considered valid.


This is what I seemed to be bumping up against so I was hoping there was an
easy workaround. But this is helpful information. Thanks,

kyle

Re: [CODE4LIB] wifi / network use policies

2015-01-23 Thread Kyle Banerjee

I haven't managed a network for years, but our approach was to provide a
broad statement of what the network was for and to make it clear the
network couldn't be used for malicious or illegal purposes.

The CYA policy is a start but you'll still have to deal with problems such
as people using the network to stalk/harass others, intentionally or
unintentionally attack other systems, and piracy. Balancing user needs with
very real privacy issues, network capacity, and the sad fact that some
people act like jerks when they can hide behind a veil of anonymity is
challenging. I'm glad I don't have to worry about that kind of stuff
anymore.

kyle


On Thu, Jan 22, 2015 at 6:11 AM, Nate Hill nathanielh...@gmail.com wrote:

 Hi all,

 I wonder if libraries that manage their own networks, either academic or
 public, would be willing to share their wifi / network use policies with
 me?  I'm working with the city of Chattanooga to separate our library's 4th
 Floor GigLab http://blog.giglab.io/ from the city's network.  The 4th
 Floor is our library's beta space / makerspace / civic lab, and we are
 constantly running public experiments of one kind or another here.  Our ISP
 has given us a separate 1gig fiber drop for this space, and we intend to
 use (or keep using) the whole area as a public laboratory to experiment
 with the network, hardware, and software.

 So... I need to get a policy to city legal for review and to my board
 before we actually make this separation.  I don't really want to go to jail
 when someone hacks North Korea from the library's GigLab.

 Thanks for any documents or input you all might provide,

 Nate


 --
 Nate Hill
 nathanielh...@gmail.com
 http://4thfloor.chattlibrary.org/
 http://www.natehill.net

[CODE4LIB] Checksums for objects and not embedded metadata

2015-01-23 Thread Kyle Banerjee

Howdy all,

I've been toying with the idea of embedding DOI's in all our digital assets
and possibly inserting/updating other metadata as well. However, doing this
would alter checksums created using normal methods.

Is there a practical/easy way to checksum only the objects themselves
without the metadata? If the metadata in a tiff or other kind of file is
modified, it does nothing to the actual object. Since providing more
complete metadata within objects makes them more usable/identifiable and
might simplify migrations down the road, it seems like this wouldn't be a
bad way to go.

Thanks,

kyle

Re: [CODE4LIB] linked data and open access

2014-12-23 Thread Kyle Banerjee



 Well, that raises an important question -- whether an 'end user use', or
 other use, do people have examples of neat/important/useful things done
 with linked data in Europe, especially that would have been harder or less
 likely without the data being modelled/distributed as linked data?


I'm sure they're doing quite a few things in Europe, but there is also
practical stuff going on with linked data in the US. Eagle-i which aims to
facilitate sharing of biomedical research. My guess is that a number of
people working on that are on this list.

At my own institution, research is being done on using ontology and linked
data to diagnose diseases. The method requires huge amounts of data, but it
potentially allows diagnosis of problems that could not be discovered any
other way. One of the people working on that group was hired by Tesla last
year -- they apparently use linked data to solve problems internally, but
I'm not sure what.

kyle

Re: [CODE4LIB] linked data and open access

2014-12-19 Thread Kyle Banerjee

On Fri, Dec 19, 2014 at 7:57 AM, Joe Hourcle onei...@grace.nascom.nasa.gov
wrote:


 I can't comment on the linked data side of things so much, but in
 following all of the comments from the US's push for opening up access to
 federally funded research, I'd have to say that capitalism and
 protectionist attitudes from 'publishers' seem to be a major factor in the
 fight against open access.


That definitely doesn't help. But quite a few players own this problem.

Pockets where there is a culture of openness can be found but at least in
my neck of the woods, researchers as a group fear being scooped and face
incentive structures that discourage openness. You get brownie points for
driving your metrics up as well as being first and novel, not for investing
huge amounts of time structuring your data so that everyone else can look
great using what you created.

Libraries face their own challenges in this regard. Even if we ignore that
many libraries and library organizations are pretty tight with what they
consider their intellectual property, there is still the issue that most of
us are also under pressure to demonstrate impact, originality, etc. As a
practical matter, this means we are rewarded for contributing to churn,
imposing branding, keeping things siloed and local, etc. so that we can
generate metrics that show how relevant we are to those who pay our bills
even if we could do much more good by contributing to community initiatives.

With regards to our local data initiatives, we don't push the open data
aspect because this has practically no traction with researchers. What does
interest them is meeting funder and publisher requirements as well as being
able to transport their own research from one environment to another so
that they can use it. The takeaway from this is that leadership from the
top does matter.

The good news is that things seem to be moving in the right direction, even
if it is at the speed of goo.

kyle

Re: [CODE4LIB] Easy Borrow or another way to automate search/request across multiple catalogs?

2014-12-15 Thread Kyle Banerjee

The answer depends on your objective. The quick answer to your question is
that union catalogs are easier to maintain and generally work better than
federated searches.

Can you say a bit more about what catalogs need to be searched and what
needs to happen? For example, do the catalogs in question belong to a
consortium and is the mechanism used to get patrons books from libraries
other than their home libraries more circ or ILL based? What kinds of
systems are involved and how have you been dealing with the need you're
hoping to meet so far? Thanks,

kyle


On Mon, Dec 15, 2014 at 11:02 AM, Darylyne Provost dprov...@colby.edu
wrote:

 I'm new to Code4Lib, so just want to say hi first of all :)

 I've searched the archives but didn't find an answer. I am wondering if
 anyone has experience with Easy Borrow
 http://library.brown.edu/its/software/easyborrow/, or can anyone suggest
 other potential solutions to automate searching and requesting across
 multiple catalogs?

 I'm sure many other libraries have similar issues: our patrons have so many
 disparate catalogs to search/request it is confusing and cumbersome. I'm
 not a programmer, and we don't have one on staff at this time. So, I am
 trying to find an existing solution for which we might be able to outsource
 customization and then turn to internal (campus) support for help with
 maintenance.

 Many thanks in advance for any suggestions!

 Darylyne

 **
 Darylyne Provost
 Assistant Director for Systems, Web,  Emerging Technologies
 Colby College
 207.859.5117
 dprov...@colby.edu

Re: [CODE4LIB] Easy Borrow or another way to automate search/request across multiple catalogs?

2014-12-15 Thread Kyle Banerjee

On Mon, Dec 15, 2014 at 11:49 AM, Darylyne Provost dprov...@colby.edu
wrote:

 Thanks so much for your reply. Our patrons currently must choose from our
 combined ILS CBBCat (III's Sierra), which we share with two other colleges;
 three consortial systems, NExpress, MaineInfoNet, and as of tomorrow,
 ConnectNY (all III's INNreach); and then, if they have searched all of
 those systems and not been able to locate/request their item, interlibrary
 loan (WorldCat/Illiad).


You have a number of options, one of which is super low tech and easy to
implement. That option is to create javascript buttons that say replicate
this search in  The javascript then scrapes the address bar to figure
out what kind of search they did and sends it with the correct syntax to
the other catalogs. Super lightweight, very easy to maintain and update,
and people get to search from their home system.

The second option is z39.50 -- assuming that the III libraries purchased
the z39.50 server product and the Koha system has a 39.50 server
configured. This has the advantage over the javascript method of searching
all the systems at the same time, but you have to interpret what comes back
and present it. Note that materials that appear available via z39.50 may
only be available to local patrons because there's no way to distinguish
borrowing privileges remotely.

If z39.50 is not an option, any federated search will be based on screen
scraping which I would not recommend.

The third major option are data dumps into a union catalog that people can
search. This will search nicely, but you have to develop a search interface
and getting real time availability is going to be problematic.

As far as requesting goes, how do patrons request stuff that's not at their
home system? Do they have accounts in the other systems (i.e. can they log
in like regular patrons) or is there a special request screen that
initiates an ILL process?

kyle

[CODE4LIB] Scanned PDF to text

2014-12-09 Thread Kyle Banerjee

Howdy all,

I've just started a project that involves harvesting large numbers of
scanned PDF's and extracting information from the text from the OCR output.
The process I've started with -- use imagemagick to convert to tiff and
tesseract to pull out the OCR -- is more system intensive than I hoped it
would be.

Is there an easier/faster process that I'm missing? Perl friendly solutions
are preferred because this fits in as part of a larger process. If I am
already using my best option, what kind of image parameters are recommended
if I want to hit the point of diminishing returns but not necessarily go
for the best possible? Thanks,

kyle

Re: [CODE4LIB] Balancing security and privacy with EZproxy

2014-11-20 Thread Kyle Banerjee

Personally, I'd be tempted to go the IP lockout route myself since the
patterns should be clear in the logs, but be aware that # megabytes gives a
reasonable level of control because you can set to log rather than lock
out. I think the risk of locking legitimate users is low. Although people
can download mixed materials, my guess is that your abusing accounts are
not watching loads of video.

There are things you can do with user names that would make it easy enough
to uncover abuse without unduly compromising privacy. For example, you
could flush your logs frequently while extracting the number of downloads
you're interested from individual users. Abuse accounts will be immediately
obvious. BTW, you can do some funky things with EZP that include
conditional logic, regexp searches, and rewriting that might be helpful.

Any path you take will protect user privacy far more than just about any
other site they visit. Plus, whoever maintains your network will
occasionally need to monitor specific computers to mitigate a wide variety
of problems. Systems used as a platform for abusive behavior, harassment,
or activity that causes harm to others get locked out and/or blacklisted
which will really hose your users. Getting that kind of thing cleared up
takes time because most places aren't nearly as forgiving as libraries.

kyle


On Wed, Nov 19, 2014 at 8:47 PM, Dan Scott deni...@gmail.com wrote:

 On Wed, Nov 19, 2014 at 4:06 PM, Kyle Banerjee kyle.baner...@gmail.com
 wrote:

  There are a number of technical approaches that could be used to identify
  which accounts have been compromised.
 
  But it's easier to just make the problem go away by setting usage limits
 so
  EZP locks the account out after it downloads too much.
 

 But EZProxy still doesn't let you set limits based on the type of download.
 You therefore have two very blunt sledge hammers with UsageLimit:

 - # of downloads (-transfers)
 - # of megabytes downloaded (-MB)

 # of downloads is effectively useless because many of our electronic
 resource platforms (hi Proquest and EBSCOHost) make between 50 and 150
 requests for JavaScript, CSS, and images per page, so you have to set your
 thresholds incredibly high to avoid locking out users who might be actively
 paging through search results. Any savvy abuser will just script their
 requests to avoid all of the JS/CSS/images to derive a list of PDFs, and
 then download just the PDFs, thereby staying well under the usage limits
 that legit users require... and I've seen exactly that happen through our
 proxy.

 # of megabytes downloaded is a pretty blunt tool as well, given that our
 multimedia-enriched databases now often serve up video and audio as well as
 HTML, images, and PDF files. For the pure audio and video streaming sites
 such as Naxos or Curio, you can set higher limits; but as vendors
 increasingly enrich their databases with audio and video, you're going to
 have to increase your general limits as well... and you can pull down a ton
 of PDFs under that cover.

 So no, I don't think it's easy to make the problem go away through the
 suggested approach, unless you're willing to err on the side of locking out
 legitimate users.

Re: [CODE4LIB] Balancing security and privacy with EZproxy

2014-11-20 Thread Kyle Banerjee

I can't remember the details because I haven't worked with EZP for years
and unfortunately, this stuff isn't documented.

Where I used it was in the user.txt file when authenticating. Things you
can do include setting/modifying session, regular EZP, and arbitrary
variables, as well as doing comparisons and file I/O. You can nest
expressions and perform reasonably sophisticated comparisons and
manipulations.

It is way more powerful than what appears in the documentation, but to get
started with it, you need someone who can provide some syntax and ideas. I
know people who know this stuff monitor c4l, so I'm hoping some of them
will weigh in.

kyle

On Thu, Nov 20, 2014 at 10:17 AM, Jonathan Rochkind rochk...@jhu.edu
wrote:

 On 11/20/14 1:06 PM, Kyle Banerjee wrote:

 BTW, you can do some funky things with EZP that include
 conditional logic


 Can you say more about funky things you can do with EZProxy involving
 conditional logic? Cause I've often wanted that but haven't found any! Are
 you talking about a particular part/area of EZProxy? (Logging?).

Re: [CODE4LIB] Balancing security and privacy with EZproxy

2014-11-20 Thread Kyle Banerjee

Assuming that the credentials are in fact compromised. They could also be
given away or sold, including by the person they belong to. And while it is
trivially easy to employ proxies, only a handful of people bother.

Finding free EZP credentials is crazy easy on Google. Try it -- you'll have
more options than you know what to do with in less than a minute.

In any case, the simplest way to achieve what you're trying to do without
going the IP route is to log users and retain data only long enough to
allow processing by a minimal detection script.

kyle

On Thu, Nov 20, 2014 at 2:17 PM, Joshua Welker wel...@ucmo.edu wrote:

 Blocking the IP is the obvious solution but not ideal at all. First off,
 it's trivially easy to bypass IP blacklists using proxies. I don't want to
 play a game of never-ending IP whack-a-mole. Second, it notifies the
 attacker that we are onto them, which makes it less likely for us to catch
 them. We want to figure out which accounts are compromised so that we can
 fix the problem at the source rather than fixing symptoms. If EZproxy is
 being abused, then it's just as likely that other, more valuable systems at
 the university are being abused as logins are shared between many systems.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Kyle
 Banerjee
 Sent: Thursday, November 20, 2014 12:07 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Balancing security and privacy with EZproxy

 Personally, I'd be tempted to go the IP lockout route myself since the
 patterns should be clear in the logs, but be aware that # megabytes gives a
 reasonable level of control because you can set to log rather than lock
 out.
 I think the risk of locking legitimate users is low. Although people can
 download mixed materials, my guess is that your abusing accounts are not
 watching loads of video.

 There are things you can do with user names that would make it easy enough
 to uncover abuse without unduly compromising privacy. For example, you
 could
 flush your logs frequently while extracting the number of downloads you're
 interested from individual users. Abuse accounts will be immediately
 obvious. BTW, you can do some funky things with EZP that include
 conditional
 logic, regexp searches, and rewriting that might be helpful.

 Any path you take will protect user privacy far more than just about any
 other site they visit. Plus, whoever maintains your network will
 occasionally need to monitor specific computers to mitigate a wide variety
 of problems. Systems used as a platform for abusive behavior, harassment,
 or
 activity that causes harm to others get locked out and/or blacklisted which
 will really hose your users. Getting that kind of thing cleared up takes
 time because most places aren't nearly as forgiving as libraries.

 kyle


 On Wed, Nov 19, 2014 at 8:47 PM, Dan Scott deni...@gmail.com wrote:

  On Wed, Nov 19, 2014 at 4:06 PM, Kyle Banerjee
  kyle.baner...@gmail.com
  wrote:
 
   There are a number of technical approaches that could be used to
   identify which accounts have been compromised.
  
   But it's easier to just make the problem go away by setting usage
   limits
  so
   EZP locks the account out after it downloads too much.
  
 
  But EZProxy still doesn't let you set limits based on the type of
  download.
  You therefore have two very blunt sledge hammers with UsageLimit:
 
  - # of downloads (-transfers)
  - # of megabytes downloaded (-MB)
 
  # of downloads is effectively useless because many of our electronic
  resource platforms (hi Proquest and EBSCOHost) make between 50 and 150
  requests for JavaScript, CSS, and images per page, so you have to set
  your thresholds incredibly high to avoid locking out users who might
  be actively paging through search results. Any savvy abuser will just
  script their requests to avoid all of the JS/CSS/images to derive a
  list of PDFs, and then download just the PDFs, thereby staying well
  under the usage limits that legit users require... and I've seen
  exactly that happen through our proxy.
 
  # of megabytes downloaded is a pretty blunt tool as well, given that
  our multimedia-enriched databases now often serve up video and audio
  as well as HTML, images, and PDF files. For the pure audio and video
  streaming sites such as Naxos or Curio, you can set higher limits; but
  as vendors increasingly enrich their databases with audio and video,
  you're going to have to increase your general limits as well... and
  you can pull down a ton of PDFs under that cover.
 
  So no, I don't think it's easy to make the problem go away through the
  suggested approach, unless you're willing to err on the side of
  locking out legitimate users.

Re: [CODE4LIB] Balancing security and privacy with EZproxy

2014-11-19 Thread Kyle Banerjee

There are a number of technical approaches that could be used to identify
which accounts have been compromised.

But it's easier to just make the problem go away by setting usage limits so
EZP locks the account out after it downloads too much. Alternatively, just
block the Chinese IP's unless you have students/faculty accessing resources
from there.

kyle

On Wed, Nov 19, 2014 at 12:52 PM, Joshua Welker wel...@ucmo.edu wrote:

Balancing security and privacy with EZproxy

 In recent months, we have been contacted several times by one of our
 vendors about our databases being accessed by rogue Chinese IP addresses.
 With the massive proliferation of online security breaches and password
 dumps, attackers are gaining access to student accounts and using them to
 access subscription resources through EZproxy. The vendor catches this
 happening and alerts us sometimes, but probably more often than not we have
 no idea. When we do find out, we force the students to change their
 passwords.

 We currently log IP addresses in EZproxy and can see when one of these
 rogue IP addresses is accessing a resource. However, we do not log user IDs
 in EZproxy, so we can’t tell which student account was compromised. Logging
 the user IDs would be a quick fix, but it has major privacy implications
 for our patrons, as we would have a record of every document they access.
 Have any other institutions encountered this problem? Are any best
 practices established for how to deal with these security breaches?

 I apologize for cross-posting.

 Josh Welker
 Information Technology Librarian
 James C. Kirkpatrick Library
 University of Central Missouri
 Warrensburg, MO 64093
 JCKL 2260
 660.543.8022

Re: [CODE4LIB] Stack Overflow

2014-11-04 Thread Kyle Banerjee

On Tue, Nov 4, 2014 at 7:34 AM, Schulkins, Joe 
joseph.schulk...@liverpool.ac.uk wrote:

 To be honest I absolutely hate the whole reputation and badge system for
 exactly the reasons you outline, but I can't deny that I do find the family
 of Stack Exchange sites extremely useful and by comparison Listservs just
 seem very archaic to me as it's all too easy for a question (and/or its
 answer) to drop through the cracks of a popular discussion. Are Listservs
 really the best way to deal with help? I would even prefer a Drupal site...


The advantage of a list that gets pushed out to everyone is that it is an
ongoing conversation that helps the community keep connected and grow. Even
if technical assistance is a part of that conversation, I see that as a
secondary benefit.

That basic questions get repeated and that questions/answers sometimes get
off track is not a problem. Quite the opposite, this format draws more
people into the conversation and makes it easier for them to connect with
others, contribute, and be inspired to do more.

kyle

Re: [CODE4LIB] MARC reporting engine

2014-11-03 Thread Kyle Banerjee

On Sun, Nov 2, 2014 at 6:29 PM, Stuart Yeates stuart.yea...@vuw.ac.nz
wrote:

 Do any of these have built-in indexing? 800k records isn't going to fit in
 memory and if building my own MARC indexer is 'relatively straightforward'
 then you're a better coder than I am.


Unless I'm missing something, this task is easier than it sounds. Since you
are interested in only a small part of the record, the memory requirements
are quite modest so you can absolutely fit it all into memory while
processing the file one line at a time. If I understand your problem
correctly, a hash of arrays or objects would make short work of this.

One handy programming reference for people who need syntax for a variety of
commonly used tasks (i.e. practically everything you would normally need)
in more than 30 languages is PLEAC (Programming Language Examples Alike
Cookbook) http://pleac.sourceforge.net/

kyle

Re: [CODE4LIB] Metadata

2014-10-29 Thread Kyle Banerjee

 On Oct 29, 2014, at 1:52 PM, Matthew Sherman matt.r.sher...@gmail.com wrote:
 
 That is a very vague question, would you care to elaborate a bit more?

This.

If we just mention standards we use, you'll get drowned in alphabet soup of 
acronyms. If you could say a few words about what you have and what you want to 
do, we could be more helpful.

Kyle

Re: [CODE4LIB] Why learn Unix?

2014-10-27 Thread Kyle Banerjee

 On Oct 27, 2014, at 10:02 AM, Siobhain Rivera siori...@indiana.edu wrote:
 
  what do you think are reasons
 librarians need to know Unix, even if they aren't in particularly tech
 heavy jobs?

The best reason is so that you can understand the problems you're working with 
as well as potential solutions better.

If you know about cars, you'll have a much easier time when you go to the 
mechanic because you'll be able to understand and communicate your needs 
better. You'll ask better questions and understand which proposed courses of 
action are best for your situation even if you never intend to do the work 
yourself.

Technology is like that. Libraries are incredibly dependent on technology, and 
it's a lot easier to get things done if you understand what tools/methods the 
services and people you rely on use.

Kyle

Re: [CODE4LIB] Recommendations for image de-duping software?

2014-10-16 Thread Kyle Banerjee

Could you say something about the type of dup detection you need? Are we
talking true duplicates, or possibly the same image in multiple formats,
cropped, etc? Roughly how many images (thousands, tens of thousands, etc)
and how big are they? Also, what did you try that did not meet your needs?
Thanks,

kyle

On Wed, Oct 15, 2014 at 2:56 PM, Shipley, Sarah sarah.ship...@seattle.gov
wrote:

 Hi,

 I was wondering if anyone had any recommendations for image de-duping
 software that compares the images rather than checksums.  We're using
 Visual Similarity Duplicate Image Finder, but find it's not as accurate as
 we'd like.   We have a very large number of images to de-dupe in our photo
 archives and with the current software can't find a balance of comparison
 that finds all the dups without producing a lot of false positives.


 [cid:image002.jpg@01CFE888.279116D0]

 Sarah Shipley, CA
 Digital Asset Manager
 Legislative Department - Office of the City Clerk
 http://www.seattle.gov/leg/clerk/
 600 Fourth Avenue, Floor 3
 PO Box 94728
 Seattle, WA 98124-4728
 206.684.8119

Re: [CODE4LIB] Requesting a Little IE Assistance

2014-10-13 Thread Kyle Banerjee

You could encode it quotable-printable or mess with content disposition
http headers.

But using these hacks or others mentioned on your data to accommodate this
use case doesn't strike me a great idea since solutions like this don't age
well.

You might suggest to your supervisor to right click and download and then
view in something else like notepad which can be set to word wrap. Or
select all and paste wherever.

Alternatively, if the supervisor doesn't actually read the emails, say that
that everyone that needs to can read the emails just fine, but there seems
to be an issue with his or her machine ;)

kyle

On Mon, Oct 13, 2014 at 1:40 PM, Matthew Sherman matt.r.sher...@gmail.com
wrote:

 Thanks for the insights.  I was really hoping IE had a setting.  The
 problem is that these are txt files with copies of the permissions e-mails
 for our institutional repository that we store in the backend of the record
 in DSpace.  So I do not know that I can edit the HTML to make them display
 properly in IE.  The real frustration is that they do display, and the
 Firefox, Chrome, Safari, ect. display them fine, but IE does not and this
 supervisor only seems to use IE.

 On Mon, Oct 13, 2014 at 4:21 PM, Andrew Anderson and...@lirn.net wrote:

  I’ve never attempted this, but instead of linking to the text files
  directly, can you include the text files in an iframe and leverage that
  to apply sizing/styling information to the iframe content?
 
  Something like:
 
  html
  body
  iframe src=“/path/to/file.txt”/iframe
  /body
  /html
 
  That structure, combined with some javascript tricks might get you where
  you need to be:
 
  http://stackoverflow.com/questions/4612374/iframe-inherit-from-parent
 
  Of course, if you’re already going that far, you’re not too far removed
  from just pulling the text file into a nicely formatted container via
 AJAX,
  and styling that container as needed, without the iframe hackery.
 
  --
  Andrew Anderson, Director of Development, Library and Information
  Resources Network, Inc.
  http://www.lirn.net/ | http://www.twitter.com/LIRNnotes |
  http://www.facebook.com/LIRNnotes
 
  On Oct 13, 2014, at 9:59, Matthew Sherman matt.r.sher...@gmail.com
  wrote:
 
   For anyone who knows Internet Explore, is there a way to tell it to use
   word wrap when it displays txt files?  This is an odd question but one
 of
   my supervisors exclusively uses IE and is going to try to force me to
   reupload hundreds of archived permissions e-mails as text files to a
   repository in a different, less preservable, file format if I cannot
 tell
   them how to turn on word wrap.  Yes it is as crazy as it sounds.  Any
   assistance is welcome.
  
   Matt Sherman

Re: [CODE4LIB] Forwarding blog post: Apple, Android and NFC – how should libraries prepare? (RFID stuffs)

2014-10-07 Thread Kyle Banerjee


 I think code4 lib is fine as it is, but I think we definitely need a
 professional organization for librarians that code. These talks of
 standards and guidelines may reflect such a need. I think LITA is awesome
 as well! But is there not a need for something else?


Aside from the library specific organizations mentioned already, there are
plenty of professional organizations for librarians who code. I'm partial
to ACM, but IEEE is another obvious choice and there are others.  Mixing it
up with people who come from different backgrounds and do different things
is fun, exposes you to more stuff, and prevents intellectual inbreeding.

Don't discount local groups. They're normally less specialized, but face
time helps you make connections between people, systems, methods, etc that
you otherwise wouldn't.

kyle

Re: [CODE4LIB] What is the real impact of SHA-256? - Updated

2014-10-03 Thread Kyle Banerjee

On Thu, Oct 2, 2014 at 3:47 PM, Simon Spero sesunc...@gmail.com wrote:

 Checksums can be kept separate (tripwire style).
 For JHU archiving, the use of MD5 would give false positives for duplicate
 detection.

 There is no reason to use a bad cryptographic hash. Use a fast hash, or use
 a safe hash.


I have always been puzzled why so much energy is expended on bit integrity
in the library and archival communities. Hashing does not accommodate
modification of internal metadata or compression which do not compromise
integrity. And if people who can access the files can also access the
hashes, there is no contribution to security. Also, wholesale hashing of
repositories scales poorly,  My guess is that the biggest threats are staff
error or rogue processes (i.e. bad programming). Any malicious
destruction/modification is likely to be an inside job.

In reality, using file size alone is probably sufficient for detecting
changed files -- if dup detection is desired, then hashing the few that dup
out can be performed. Though if dups are an actual issue, it reflects
problems elsewhere. Thrashing disks and cooking the CPU for the purposes
libraries use hashes for seems way overkill, especially given that basic
interaction with repositories for depositors, maintainers, and users is
still in a very primitive state.

kyle

Re: [CODE4LIB] What is the real impact of SHA-256? - Updated

2014-10-03 Thread Kyle Banerjee

On Fri, Oct 3, 2014 at 7:26 AM, Charles Blair c...@uchicago.edu wrote:

 Look at slide 15 here:
 http://www.slideshare.net/DuraSpace/sds-cwebinar-1

 I think we're worried about the cumulative effect over time of
 undetected errors (at least, I am).


This slide shows that data loss via drive fault is extremely rare. Note
that a bit getting flipped is usually harmless. However, I do believe that
data corruption via other avenues will be considerably more common.

My point is that the use case for libraries is generally weak and the
solution is very expensive -- don't forget the authenticity checks must
also be done on the good files. As you start dealing with more and more
data, this system is not sustainable for the simple reason that maintained
disk space costs a fortune and network capacity is a bottleneck. It's no
big deal to do this on a few TB since our repositories don't have to worry
about the integrity of dynamic data, but you eventually get to a point
where it sucks up too many systems resources and consumes too much
expertise.

Authoritative files really should be offline but if online access to
authoritative files is seen as an imperative, it at least makes more sense
to just do something like dump it all in Glacier and slowly refresh
everything you own with authoritative copy. Or better yet, just leave the
stuff there and just make new derivatives when there is any reason to
believe the existing ones are not good.

While I think integrity is an issue, I think other deficiencies in
repositories are  more pressing. Except for the simplest use cases, getting
stuff in or out of them is a hopeless process even with automated
assistance. Metadata and maintenance aren't very good either. That you
still need coding skills to get popular platforms that have been in use for
many years to ingest and serve up things as simple as documents and images
speaks volumes.

kyle

Re: [CODE4LIB] Non-library job boards to advertise a developer position widely

2014-10-03 Thread Kyle Banerjee

Depending on customs in your area, it can make sense to post real jobs to
Craigslist.

kyle

On Fri, Oct 3, 2014 at 11:57 AM, Francis Kayiwa kay...@pobox.com wrote:

 On 10/03/2014 02:52 PM, Kim, Bohyun wrote:

 Hi all,

 Which non-library job boards would be good to advertise a web developer
 job posting widely? I only have usual suspects (Indeed.com, Monster.com,
 Glassdoor.com, Engieerjobs.com(http:/www.engineerjobs.com/jobs/
 software-engineering/%20), SimplyHired.com) and library listservs so
 far. So I am hoping to catch some job boards frequented by software
 developers.

 Any Suggestions? (Local job boards in Maryland, D.C, Virginia would be
 also super helpful; although the position allows remote work so I would
 like to advertise as widely as we can.)



 I've done quite a number of projects from Flex Jobs[0]

 Not sure how much it costs to post there but I don't imagine it would be
 much more than the one's you list above.

 ./fxk

 [0] http://www.flexjobs.com/telecommute/employers

 --
 You single-handedly fought your way into this hopeless mess.

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Kyle Banerjee

IMO, API isn't the best tool for this job. My inclination would be to just
download the LCNAF data, normalize source and comparison data, and then
compare via hash.

That will be easier to write, and you'll be able to do thousands of
comparisons per second.

kyle

On Mon, Sep 29, 2014 at 8:24 AM, Jonathan Rochkind rochk...@jhu.edu wrote:

 For yet another data set and API that may or may not meet your needs,
 consider VIAF -- Virtual International Authority File, operated by OCLC.

 The VIAF's dataset includes the LC NAF as well as other national authority
 files, I'm not sure if the API is suitable to limiting matches to the LC
 NAF, I haven't done much work with it, but I know it has an API.

 http://oclc.org/developer/develop/web-services/viaf.en.html


 On 9/29/14 10:18 AM, Trail, Nate wrote:

 The ID.loc.gov site has a good known label service described here under
 known label retrieval :
 http://id.loc.gov/techcenter/searching.html

 Use  Curl and content negotiation to avoid screen scraping, for example,
 for LC Name authorities:

 curl -L -H Accept: application/rdf+xml http://id.loc.gov/
 authorities/names/label/Library%20of%20Congress

 Nate

 ==
 Nate Trail
 LS/TECH/NDMSO
 Library of Congress
 n...@loc.gov


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Simon Brown
 Sent: Monday, September 29, 2014 9:38 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Reconciling corporate names?

 You could always web scrape, or download and then search the LCNAF with
 some script that looks like:

 #Build query for webscraping
 query = paste(http://id.loc.gov/search/?q=;, URLencode(corporate name
 here ), q=cs%3Ahttp%3A%2F%2Fid.loc.gov%2Fauthorities%2Fnames)

 #Make the call
 result = readLines(query)

 #Find the lines containing Corporate Name
 lines = grep(Corporate Name, result)

 #Alternatively use approximate string matching on the downloaded LCNAF
 data query - agrep(corporate name here,LCNAF_data_here)

 #Parse for whatever info you want
 ...

 My native programming language is R so I hope the functions like paste,
 readLines, grep, and URLencode are generic enough for other languages to
 have some kind of similar thing.  This can just be wrapped up into a for
 loop:
 for(i in 1:4){...}

 Web scraping the results of one name at a time would be SLOW and
 obviously using an API is the way to go but it didn't look like the OCLC
 LCNAF API handled Corporate Name.  However, it sounds like in the previous
 message someone found a work around.  Best of luck! -Simon






 On Mon, Sep 29, 2014 at 8:45 AM, Matt Carruthers mcarr...@umich.edu
 wrote:

  Hi Patrick,

 Over the last few weeks I've been doing something very similar.  I was
 able to figure out a process that works using OpenRefine.  It works by
 searching the VIAF API first, limiting results to anything that is a
 corporate name and has an LC source authority.  OpenRefine then
 extracts the LCCN and puts that through the LCNAF API that OCLC has to
 get the name.  I had to use VIAF for the initial name search because
 for some reason the LCNAF API doesn't really handle corporate names as
 search terms very well, but works with the LCCN just fine (there is
 the possibility that I'm just doing something wrong, and if that's the
 case, anyone on the list can feel free to correct me).  In the end,
 you get the LC name authority that corresponds to your search term and
 a link to the authority on the LC Authorities website.

 Anyway,  The process is fairly simple to run (just prepare an Excel
 spreadsheet and paste JSON commands into OpenRefine).  The only
 reservation is that I don't think it will run all 40,000 of your names
 at once.  I've been using it to run 300-400 names at a time.  That
 said, I'd be happy to share what I did with you if you'd like to try
 it out.  I have some instructions written up in a Word doc, and the
 JSON script is in a text file, so just email me off list and I can send
 them to you.

 Matt

 Matt Carruthers
 Metadata Projects Librarian
 University of Michigan
 734-615-5047
 mcarr...@umich.edu

 On Fri, Sep 26, 2014 at 7:03 PM, Karen Hanson
 karen.han...@ithaka.org
 wrote:

  I found the WorldCat Identities API useful for an institution name
 disambiguation project that I worked on a few years ago, though my
 goal wasn't to confirm whether names mapped to LCNAF.  The API
 response

 includes

 a LCCN, and you can set it to fuzzy or exact matching, but you would
 need to write a script to pass each term in and process the results:


  http://oclc.org/developer/develop/web-services/worldcat-identities.en.
 html


 I also can't speak to whether all LC Name Authorities are
 represented, so there may be a chance of some false negatives.

 OCLC has another API, but not sure if it covers corporate names:
 https://platform.worldcat.org/api-explorer/LCNAF

 I suspect there are others on the list that know more about the
 inner workings of

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Kyle Banerjee

After a quick search, http://id.loc.gov/download/ looks like the place to
go. I haven't downloaded it myself, but the file sizes make it look like
the right stuff.

kyle

On Mon, Sep 29, 2014 at 10:55 AM, Jean Roth jr...@nber.org wrote:

 What is the link to the downloadable LCNAF data?  --  Jean

 On Mon, 29 Sep 2014, Kyle Banerjee wrote:

 KB IMO, API isn't the best tool for this job. My inclination would be to
 just
 KB download the LCNAF data, normalize source and comparison data, and then
 KB compare via hash.
 KB
 KB That will be easier to write, and you'll be able to do thousands of
 KB comparisons per second.
 KB
 KB kyle

Re: [CODE4LIB] Reconciling corporate names?

2014-09-29 Thread Kyle Banerjee

The best way to handle them depends on what you want to do. You need to
actually download the NAF files rather than countries or other small files
as different kinds of data will be organized differently. Just don't try to
read multigigabyte files in a text editor :)

If you start with one of the giant XML files, the first thing you'll
probably want to do is extract just the elements that are interesting to
you. A short string parsing or SAX routine in your language of choice
should let you get the information in a format you like.

If you download the linked data files and you're interested in actual
headings (as opposed to traversing relationships), grep and sed in
combination with the join utility are handy for extracting the elements you
want and flattening the relationships into something more convenient to
work with. But there are plenty of other tools that you could also use.

If you don't already have a convenient environment to work on, I'm a  fan
of virtualbox. You can drag and drop things into and out of your regular
desktop or even access it directly. That way you can view/manipulate files
with the linux utilities without having to deal with a bunch of clunky file
transfer operations involving another machine. Very handy for when you have
to deal with multigigabyte files.

kyle

On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth jr...@nber.org wrote:

 Thank you!  It looks like the files are available as  RDF/XML, Turtle, or
 N-triples files.

 Any examples or suggestions for reading any of these formats?

 The MARC Countries file is small, 31-79 kb.  I assume a script that
 would read a small file like that would at least be a start for the LCNAF

Re: [CODE4LIB] ruby-marc: how to sort fields after append?

2014-09-12 Thread Kyle Banerjee

On Fri, Sep 12, 2014 at 9:20 AM, Galen Charlton g...@esilibrary.com wrote:

 ...
 One caveat though -- at least in MARC21, re-sorting a MARC record
 strictly by tag number can be incorrect for certain fields...


This is absolutely true. In addition to the fields you mention, 4XX, 7XX,
and 8XX are also not necessarily in numerical order even if most records
contain them this way.  There is no way to programatically determine the
correct sort. While this may sound totally cosmetic, it sometimes has use
implications. Depending on how the sort mechanism works, you could
conceivably reorder fields with the same number in the wrong order.

The original question was how to resort a MARC record after appending a
field which appears to be a control number. I would think it preferable to
iterate through the fields and place it in the correct position (I'm
assuming it's not an 001) rather than append and sort.

However, record quality is such a mixed bag nowadays and getting much worse
that tag order is the least of the corruption issues. Besides, most
displays normalize fields so heavily that these type of distinctions simply
aren't supported anymore.

kyle

Re: [CODE4LIB] ruby-marc: how to sort fields after append?

2014-09-12 Thread Kyle Banerjee

On Fri, Sep 12, 2014 at 10:11 AM, Terry Reese ree...@gmail.com wrote:

 ...  In fact, I wouldn't even resort the data to begin with ...


Ding! Ding! Ding!

And we have a winner for easiest and most practical solution. Any user
display is either not going to display the control number being appended at
all, or it will list it wherever it is already listing it. So no need to
reposition it.

As far as grouchy catalogers go, fewer and fewer systems display bib
records that look a lot like legal documents from bygone times. There are a
few holdouts (mostly in environments where the display is optimized for
staff rather than users), but that battle was decided years ago.

kyle

Re: [CODE4LIB] Technology for Librarians / Libraries for Technologians

2014-09-04 Thread Kyle Banerjee

 I know a lot gets said (here and elsewhere) about Technology for Librarians
 - important skills and standards, what's
 important/useful/trending/ignorable, and the like. But I'd love to start a
 discussion (or join one, if it already exists elsewhere) about the other
 side of things - the library-specific stuff that experienced IT folks might
 need to learn or get used to to be successful in a library environment. Not
 just technical stuff like MARC, but also ethical issues like fair use,
 information privacy, freedom of access, and the like.


I think some of these issues are distractions as they aren't specific to
libraries, aren't really different than any IT work involving private
information (i.e. virtually all IT work), and don't require library
expertise to understand. However, on the question of whether the job of
Director of Library IT is more about librarianship or IT, I'd always
assumed the former is the case.

Library IT needs to leverage library specific knowledge/technologies to
perform functions that plain IT cannot if the cost of an independent IT
unit is to be justified. Everyone relates to public search interfaces, but
there's an entire infrastructure that makes a combination of licensed,
purchased, locally created, and borrowed resources with differential access
for various user groups (some of them external) possible.

Knowledge of formats, protocols, standards, and common practices is
helpful, but understanding business needs that are common to libraries but
not really thought of elsewhere is also essential.  If we mostly duplicate
commodity functions that are already performed elsewhere, we just set
ourselves up to be outsourced.

kyle

Re: [CODE4LIB] Library Privacy, RIP (Was: Canvas Fingerprinting by AddThis)

2014-08-17 Thread Kyle Banerjee

You need to cut holes so you can see -- I should have mentioned that. Be
sure to wear sunglasses to confound remote retinal scanners...


On Sat, Aug 16, 2014 at 1:59 PM, Cary Gordon listu...@chillco.com wrote:

 I tried a paper bag, but it was very hard to find books.


 On Fri, Aug 15, 2014 at 4:34 PM, Kyle Banerjee kyle.baner...@gmail.com
 wrote:

  On Fri, Aug 15, 2014 at 3:02 PM, Jason Bengtson j.bengtson...@gmail.com
 
  wrote:
 
   ...
  
   Generally speaking, I think  surveillance is wretched stuff. But there
  is a
   point at which the hand wringing becomes a bit much. I agree with Jon
 in
   that, while things are at a critical point, the technologies of
 security
   and anonymity will inevitable improve. In fact, the cruddy state of
  things
   has been adding momentum to that progress...
  
 
  And there are always the tried and tested technologies that have been
  around for ages. For example, if users wore paper bags over their heads,
 it
  would protect their anonymity and afford some privacy while they used
  resources in the library -- particularly when they need assistance.
   Anonymous checkout privileges secured with a bitcoin deposit could
 ensure
  accountability.
 
  As things stand, many if not most library staff know all kinds of things
  about their users. The paper bag solution (actually another material
 should
  be chosen to make it safer for smokers) is a major step towards
 rectifying
  this privacy and service issue. ;-)
 



 --
 Cary Gordon
 The Cherry Hill Company
 http://chillco.com

Re: [CODE4LIB] Hiring strategy for a library programmer with tight budget - thoughts?

2014-08-15 Thread Kyle Banerjee

 I am in a situation in which a university has a set salary guideline for
 programmer position classifications and if I want to hire an entry-lever
 dev, the salary is too low to be competitive and if I want to hire a more
 experienced dev in a higher classification, the competitive salary amount
 exceeds what my library cannot afford. So as a compromise I am thinking
 about going the route of posting a half-time position in a higher
 classification so that the salary would be at least competitive. It will
 get full-time benefits on a pro-rated basis. But I am wondering if this
 strategy would be viable or not.

 Also anyone has a experience in hiring a developer to telework completely
 from another state when you do not have previous experience working with
 her/him? This seems a bit risky strategy to me but I am wondering if it may
 attract more candidates particularly when the position is half time.


I think your idea of trying to be more competitive in a higher
classification is a solid one. The way natural selection works when you
don't pay competitively is that the good people move along relatively soon
while those who are less employable tend to stick around. This causes
trouble in the long term.

Hiring from another state can work great, and you'll probably need to do
this if you can only offer half time. As a practical matter, it works just
as well as a short distance telecommute since you interact the same way .
Going the contract route can also work, but keep in mind that might have a
huge impact on your range of motion as policies governing outside
contractors can make simple things complicated. I would avoid contract
labor for anything you intend to maintain over the long term. Even if
someone can build something that somehow requires no troubleshooting or
maintenance, there will be heck to pay when technology cycles force
migrations.

kyle

Re: [CODE4LIB] Library Privacy, RIP (Was: Canvas Fingerprinting by AddThis)

2014-08-15 Thread Kyle Banerjee

On Fri, Aug 15, 2014 at 3:02 PM, Jason Bengtson j.bengtson...@gmail.com
wrote:

 ...

 Generally speaking, I think  surveillance is wretched stuff. But there is a
 point at which the hand wringing becomes a bit much. I agree with Jon in
 that, while things are at a critical point, the technologies of security
 and anonymity will inevitable improve. In fact, the cruddy state of things
 has been adding momentum to that progress...


And there are always the tried and tested technologies that have been
around for ages. For example, if users wore paper bags over their heads, it
would protect their anonymity and afford some privacy while they used
resources in the library -- particularly when they need assistance.
 Anonymous checkout privileges secured with a bitcoin deposit could ensure
accountability.

As things stand, many if not most library staff know all kinds of things
about their users. The paper bag solution (actually another material should
be chosen to make it safer for smokers) is a major step towards rectifying
this privacy and service issue. ;-)

Re: [CODE4LIB] Dewey code

2014-08-11 Thread Kyle Banerjee


 We are a church with 1500 books we would like to put on our website, and
 thought we would use this workflow:

 1.  Create barcode from isbn number and print label.
 2.  Acquire Dewey number from Library of Congress via z39.50, and
 print that to a label.
 3.  Affix labels to the books.
 4.  Place marc records into a Postgresql database and allow users to
 search via a browser, using Ruby on Rails for the front and back ends.

 At the moment I'm trying to figure out step 2. I'm the church
 volunteer webmaster and not a coder, working with two other volunteers who
 happen to be career professional librarians but not programmers.  If the
 Dewey numbers generated by the LC are insufficient, we'll tweak them over
 time. I just need to know how to isolate that one field to print it to our
 thermal label printer.


To answer your question directly, there are a number of ways to get the
numbers. You could transform the record to text using any MARC tool or
retrieve these values via web service from LC or OCLC. My guess is that the
classification numbers you get will work well enough for your purposes

However, I think you may be using a chain saw to cut butter. My sense is
that services designed for personal libraries that others have already
suggested would probably be more practical and a lot easier to maintain.
Keep in mind that whatever you create must be maintained by the volunteers
that follow you.

Frankly, if I were in your shoes, I'd be inclined to go with LibraryThing
and use old skool paper cards to handle circulation. Nowadays, people tend
to use technology for everything, even when analog/manual methods are more
efficient/better.

For classification, it really doesn't matter how you shelve 1500 books.
More records will contain LC than Dewey numbers so even if the system is
overkill, it might be easier to use. Just do what ever seems like it works
and if you can't decide, going by author or title really isn't that bad.
Frankly you'd be able to find things even if they were shuffled randomly.

kyle

Re: [CODE4LIB] Dewey code

2014-08-08 Thread Kyle Banerjee

Label printing practices vary by library. Just out of curiosity, why are
you getting this information from a MARC file rather than the ILS? At
many/most libraries, you'd need local Cuttering, item specific (e.g.
volume/copy number), etc info not available in the bib record.

kyle


On Fri, Aug 8, 2014 at 12:33 PM, Tom Connolly tedwardconno...@gmail.com
wrote:

 Is there an open source way to format the dewey code for printing book
 labels? Or can someone tell me how to isolate just the dewey number from a
 marc file (I have MarcEdit; is there a better tool for this simple task?)
 so it is the only field sent to the printer? (I'm using Ubuntu 14.04 and
 printing to a Dymo 450) Thanks
 Tom Connolly
 St. Paul's Episcopal Church, Naples FL
 webmaster

[CODE4LIB] Publishing large datasets

2014-07-23 Thread Kyle Banerjee

We've been facing increasing requests to help researchers publish datasets.
There are many dimensions to this problem, but one of them is applying
appropriate metadata and mounting them so they can be explored with a
regular web browser or downloaded by expert users using specialized tools.

Datasets often are large. One that we used for a pilot project contained
well over 10,000 objects with a total size of about 1 TB. We've been asked
to help with much larger and more complex datasets.

The pilot was successful but our current process is neither scalable nor
sustainable. We have some ideas on how to proceed, but we're mostly making
things up. Are there methods/tools/etc you've found helpful? Also, where
should we look for ideas? Thanks,

kyle

Re: [CODE4LIB] NCIP path on a Millennium server

2014-07-22 Thread Kyle Banerjee

AFAIK, Mil doesn't support NCIP. Rather, the library has to have purchased
the III's DCB product.

There is a project to allow Evergreen libraries to communicate with DCB via
NCIP at https://github.com/iNCIPit It works and is used by a few libraries.

This will contain information both connection and the specific NCIP syntax
you will need .

kyle


On Tue, Jul 22, 2014 at 8:16 AM, Ian Chan ic...@csusm.edu wrote:

 Hi,

 If you know the typical path and/or port on Millennium to which I would
 send an NCIP message, would you mind sharing that with me?

 Thank you in advance for your help.

 Ian

 - - - - - - - - - - - - - - -

 Ian Chan
 Systems Coordinator and Web Development Librarian
 California State University San Marcos
 KEL 1002
 tel:7607504385
 http://biblio.csusm.edu
 Skype: ian.t.chan

Re: [CODE4LIB] NCIP path on a Millennium server

2014-07-22 Thread Kyle Banerjee

One thing I forgot to mention is that their NCIP is an all or nothing
proposition -- you do not enable individual NCIP services at the III end.
This means that valid responses need to be sent in response to everything
(even if they only perform a null op at the responder end) or the system
will keep trying to resend the request.

kyle


On Tue, Jul 22, 2014 at 9:05 AM, Kyle Banerjee kyle.baner...@gmail.com
wrote:

 AFAIK, Mil doesn't support NCIP. Rather, the library has to have purchased
 the III's DCB product.

 There is a project to allow Evergreen libraries to communicate with DCB
 via NCIP at https://github.com/iNCIPit It works and is used by a few
 libraries.

 This will contain information both connection and the specific NCIP syntax
 you will need .

 kyle


 On Tue, Jul 22, 2014 at 8:16 AM, Ian Chan ic...@csusm.edu wrote:

 Hi,

 If you know the typical path and/or port on Millennium to which I would
 send an NCIP message, would you mind sharing that with me?

 Thank you in advance for your help.

 Ian

 - - - - - - - - - - - - - - -

 Ian Chan
 Systems Coordinator and Web Development Librarian
 California State University San Marcos
 KEL 1002
 tel:7607504385
 http://biblio.csusm.edu
 Skype: ian.t.chan

Re: [CODE4LIB] net.fun

2014-07-14 Thread Kyle Banerjee

The only problem is that some people might have difficulty obtaining audio
modems that could be made to work with their cell phones...


On Mon, Jul 14, 2014 at 8:56 AM, Riley Childs ri...@tfsgeo.com wrote:

 I know I might be little youn but code4lib needs a bbs

 Riley Childs
 Student
 Asst. Head of IT Services
 Charlotte United Christian Academy
 (704) 497-2086
 RileyChilds.net
  Sent from my Windows Phone, please excuse mistakes

 -Original Message-
 From: Joe Hourcle onei...@grace.nascom.nasa.gov
 Sent: ‎7/‎14/‎2014 11:52 AM
 To: CODE4LIB@LISTSERV.ND.EDU CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] net.fun

 On Jul 14, 2014, at 10:44 AM, Cary Gordon wrote:

  I remember when system administrators would change the MOTD daily. The
 '80s
  were so pastoral.

 0 0 * * * /bin/fortune  /etc/motd

 or, for those running Vixie cron (which most people weren't in the 80s) :

 @daily /bin/fortune  /etc/motd


 ... but then, everyone went the way of 'web portals' and the like, rather
 than assuming everyone was going to be (telnet|tn3270)ing into a (unix|cms)
 system so they could check their e-mail, nntp, gopher, etc.

 -Joe

 ps. is it disturbing that the talk of motd is making me nostalgic for
 ASCII art?





  On Monday, July 14, 2014, Joe Hourcle onei...@grace.nascom.nasa.gov
 wrote:
 
  On Jul 14, 2014, at 8:21 AM, Riley Childs wrote:
 
  My MOTDs are not as fun...
 
  RUN GET OUT OF HERE
  YOU ARE NOT WELCOME TODAY
  RESTRICTED ACCESS HERE.
 
  I would expect that in the banner, not the motd:
 
 $ more /etc/banner
 
 This US Government computer is for authorized users only. By
  accessing
 this system you are consenting to complete monitoring with no
 expectation of privacy. Unauthorized access or use may subject
 you
  to
 disciplinary action and criminal prosecution.
 
 
  The banner gets displayed before the login prompt, the motd gets
 displayed
  after ... there's also an assumption that the motd changes regularly, as
  it's 'message of the day' ... although most people have it be completely
  random and just call fortune or never bother changing it.
 
  -Joe
 
 
 
  --
  Cary Gordon
  The Cherry Hill Company
  http://chillco.com

[CODE4LIB] Job: Systems and Applications Librarian -- Walla Walla, WA

2014-07-02 Thread Kyle Banerjee

** Apologies for duplicate postings **

SYSTEMS  APPLICATIONS LIBRARIAN

Whitman College seeks a dynamic, creative and technically proficient
individual for the position of Systems  Applications Librarian who will
help provide leadership as the Penrose Library transitions to an expanding
digital presence.

The primary responsibility of this librarian is to identify, implement, and
support computer applications and technologies that enhance the library’s
ability to delivery services in both local and global networked
environments.  This work includes systems application upgrades,
configuration, maintenance, integration, troubleshooting, continued
evaluation, training, and design and maintenance of the Library’s website.
 This position requires strong analytical and communications skills to
develop and implement successful technology strategies for library
operations and functions.

Preference will be given to candidates who demonstrate the following:
knowledge of current issues and trends in library technology; knowledge of
contemporary web design and development and common scripting languages;
demonstrated project management abilities; the ability to operate and
maintain library integrated systems in a shared environment; knowledge of
national standards for library systems, authentication, networking, and
protocols for search and retrieval; understanding of metadata schema.
 Strong candidates will be able to evaluate the implications of adopting
new technologies, and how they can be leveraged for liberal arts college
libraries and the learners they serve.

The successful candidate will be flexible, creative, and enthusiastic. S/he
will have a demonstrated ability to work collaboratively and possess a
strong service commitment, with a demonstrated ability to plan, coordinate
and carry out complex projects.  Requires an MLS/MLIS and or/equivalent
combination of education and experience; experience working in a library
technology position, preferably in an academic setting that supported
 systems for library management, network infrastructure, digital library
services, web development, scholarly communication, research support and
emerging technologies; evidence of establishing priorities and seeing
projects through to completion.

Whitman is a private, selective, non-sectarian, residential college of the
liberal arts and sciences with approximately 1500 students and 150 faculty.
 Penrose Library has a strong service orientation, a team-orientated
approach to decision making, and provides excellent opportunities for
professional development. Penrose Library consistently ranks highly in
Princeton Review’s Best College Library category.

The College is located in Walla Walla, positioned in the heart of beautiful
SE Washington’s wine country in the foothills of the Blue Mountains. The
area allows one to experience a wide variety of recreational opportunities,
provides access to more than a dozen art galleries, three theatres, and the
oldest continuous symphony west of the Mississippi River. Whitman has
vibrant theatre and music programs and routinely invites renowned speakers
and performers to campus. Moreover, residents of the state of Washington
pay no state income tax.

A job description and application requirements are available at:
http://www.whitman.edu/hr. Application review will begin August 4, 2014 and
will continue until filled.  For more information about Whitman College see
http://www.whitman.edu.  Whitman is building a diverse academic community
and encourages minorities, women and persons with disabilities to apply.
 Experience that contributes to the diversity of the College is
appreciated.

Dalia L. Corkrum
College Librarian
OCLC Global Council Delegate for the Americas, 2013-2015
Penrose Library | Whitman College | 345 Boyer Ave. | Walla Walla, WA 99362
cork...@whitman.edu | 509-527-5193 | 509-527-5900 (fax)

Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)

2014-06-16 Thread Kyle Banerjee

I'd just do this the old fashioned way. Awk is great for problems like
this. For example, if your file is tab delimited, the following should work

awk '{FS=\t}{if ($2 != ) question = $2;}{print $1,question,$3}''
yourfile

In the example above, I just print the fields but you could easily encase
them in tags.

kyle


On Mon, Jun 16, 2014 at 9:29 AM, Ryan Engel rten...@wisc.edu wrote:

 Thanks for the responses, on the list and off, so far.

 As I'm sure is true for so many of us, my interest in learning more about
 how to solve this type of problem is balanced against my need to just get
 the project done so I can move on to other things.  One of the great things
 about this list is the ability to learn from the collective experiences of
 colleagues.  For this project specifically, even clues about better search
 terms is useful; as Chris Gray pointed out, basic Google searches present
 too many hits.

 I did try following the Create an XML data file and XML schema file from
 worksheet data instructions on the Microsoft site.  And it did produce an
 XML document, but it wasn't able to transform this:
 Row1Question1Q1Answer1
 Row2Q1Answer2

 ...into something like this:
 row1Row One Data/row1
 question1This is a question/question1
 answers
 q1answer1Answer 1/q1answer1
 q1answer2Answer2/q1answer2
 /answers

 Instead, I could get it to either convert every row into its own XML
 entry, meaning that I had a lot of answers with no associated questions, or
 I got an XML file that had 1 question with EVERY SINGLE answer nested
 beneath it -- effectively all questions after the first question were
 ignored.  Based on those results, I wasn't sure if there is more tweaking I
 could do in Excel, or if there is some programmed logic in Excel that can't
 be accounted for when associating a schema.


 Another suggestion I received was to fill the question column so that
 every row had a question listed.  I did consider this, but the problem then
 is during the data import, I'd have to convince my CMS to put all the
 answers back together based on the question, something I'm sure Drupal
 COULD do, but I'm not sure how to do that either.


 Finally, this project is a spreadsheet with 225,270 rows, so you can
 imagine why I'd like a process that is reasonably trustworthy AND that can
 run locally.


 Anyway, any/all additional suggestions appreciated, even if they are try
 searching for blah blah python parser, or I made something that solves a
 similar process, and you can download it from Git.

 Ryan
 ___

 Ryan Engel
 Web Stuff
 UW-Madison

 Dana Pearson mailto:dbpearsonm...@gmail.com
 June 13, 2014 at 7:14 PM
 I don't use Excel but a client did who wanted to use XSL I had created
 ONIX
 to MARC to transform bibliographic metadata in Excel to XML. The built
 in Excel XML converter was not very helpful since empty cells were skipped
 so that it was impossible to use that result.

 There is an add on that allow you to map your data to XML elements by
 creating a schema which is pretty cool.

 http://bit.ly/1jpwtqM

 This might be helpful.

 regards,
 dana





 Terry Brady mailto:tw...@georgetown.edu
 June 13, 2014 at 6:53 PM
 The current version of Excel offers a save as XML option.

 It will produce something like this. There is other wrapping metadata, but
 the table is pretty easy to parse.

 Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7
 x:FullColumns=1
 x:FullRows=1 ss:DefaultRowHeight=15
 Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell
 CellData ss:Type=Stringquestion 1/Data/Cell
 CellData ss:Type=Stringanswer 1/Data/Cell
 /Row
 Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell
 Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell
 /Row
 Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell
 Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell
 /Row
 Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell
 CellData ss:Type=Stringquestion 2/Data/Cell
 CellData ss:Type=Stringanswer 1/Data/Cell
 /Row
 Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell
 Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell
 /Row
 Row
 Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell
 CellData ss:Type=Stringquest /Data/Cell
 CellData ss:Type=Stringanswer 3/Data/Cell
 /Row
 Row
 Cell ss:StyleID=s62/
 /Row
 /Table





 Ryan Engel mailto:rten...@wisc.edu
 June 13, 2014 at 4:28 PM
 Hello -

 I have an Excel spreadsheet that, for the purposes of an easy import into
 a Drupal site, I'd like to convert to XML.  I know people more
 knowledgeable than I could code up something in Python or Perl to convert a
 CSV version of the data to XML (and I have a colleague who offered to do
 just that for me), but I am looking for recommendations for something more
 immediately accessible.

 Here's an idea of how the spreadsheet is structured:

 Row1Question1Q1Answer1
 Row2Q1Answer2
 Row3Q1Answer3
 Row4Question2

Re: [CODE4LIB] Jobs Digest

2014-06-04 Thread Kyle Banerjee

On Wed, Jun 4, 2014 at 1:55 PM, Eric Lease Morgan emor...@nd.edu wrote:

  C4L is not a democracy but an anarchy.

 Sometimes. We vote on conference locations. We vote on keynote talks. We
 vote for presentations. Everybody had multiple opportunities to voice their
 opinion. I think this vote should count too. —ELM


We should vote on whether this vote counts

[CODE4LIB] Anonymizing address data

2014-06-02 Thread Kyle Banerjee

HIPPA compliant data cannot include personally identifiable information, a
category which includes address. The safe harbor approach where
geographic subdivisions smaller than states cannot be used frequently
renders data useless.

The expert determination method is always an option and precompiling can
work in certain cases, but I was wondering what other methods people have
successfully employed? Thanks,

kyle

Re: [CODE4LIB] Job Interview : A Libcoder's Helpful Advices

2014-05-12 Thread Kyle Banerjee

On Mon, May 12, 2014 at 7:29 AM, Bigwood, David dbigw...@hou.usra.eduwrote:

 Asking questions is an essential part of the interview. You are
 interviewing them as well as them you. But, never ask questions that can be
 easily answered by browsing their website or common reference works.


It blows my mind how many people don't do their homework. You need to give
your potential employer real thought. Don't just spend 45 minutes browsing
the website. Think about what they've done and are hoping to do -- same
goes for people you'd be working with.

Hiring someone is the most important/expensive thing that organizations do.
It's very possible that the place that hires you will invest more than $1
million in you. You have a lot of skin in the game too -- your choice of
job determines where you are and what you do most of your waking hours for
a long time. You owe them and yourself much more than a few stock questions
that anyone could come up with.

Good questions show what about them interests you and they help everyone
understand each other better. An interview is a conversation where both
sides need to engage. Questions asked either by the interviewer or the
interviewee just for the sake of asking something are boring and won't help
you or your potential employer.

kyle

Re: [CODE4LIB] Job Interview : A Libcoder's Helpful Advices

2014-05-12 Thread Kyle Banerjee

On Mon, May 12, 2014 at 11:32 AM, Tom Johnson 
johnson.tom+code4...@gmail.com wrote:

 
 At the very least, if you're going to hire for personality traits, you need
 to do some very serious thinking about whether and why you think those
 traits will actually make the person more effective at their job.  Do the
 reasons amount to prejudice?  Are they exploitative in some other way?


This is what it boils down to. These traits can be slippery at times, but
they are still essential.

A person is much more than a set of skills, how long they've warmed their
chair, and whatever they can tick off on their resume. Whatever you do, you
have to engage well with the rest of the team and help bring out best in
others. You need to identify and be working on problems before anyone knows
they exist. No amount of knowledge can make up for a bad attitude and lack
of motivation.

This works both ways. What makes a job great or lousy is rarely what people
ask about. You can have a great title, great pay, good budget, etc, but
that does you little good if you have to work in a  dysfunctional
atmosphere.

One of the questions I always ask is If I'm hired, what will I really wish
I asked a year from now? You want to know about turf and trust issues,
screwball personnel situations, and a host of other things that make or
break an environment.

There absolutely is such thing as fit and I've been told before that I
wasn't a fit. That's not fun if you don't have a job already (which was the
case for me at the time). But the institution that did this was absolutely
right. None of us fit everywhere, so when the fit is bad, you're way better
off going someplace where you have a better chance of succeeding.

kyle

Re: [CODE4LIB] separate list for Jobs

2014-05-09 Thread Kyle Banerjee


 I have filters set up, and find they just don't work reliably. OK, they
 work 9 times out of 10, but things always slip through.

 Imho, there are more people inconvenienced by having jobs on the list
 (setting up filters, filters not working, unable to filter digests, etc.)
 than there are people inconvenienced by having a separate list for jobs (is
 there really anyone that can't sign up for a separate list?


The same could be said for virtually everything here.

Many of the discussions (such as this one) aren't technical at all. Those
that are tend to be dominated by a narrow range of objectives, methods, and
tools representing only a small part of library operations and
technologies. This means every topic most likely appeals only to a minority
of subscribers.

I believe there is real value to a common experience as well as not
contributing to expanding fragmentation of the library community into
needlessly specialized microcosms.

There are multiple approaches that can work for people who are overwhelmed.
The easiest is simply to set to nomail and read from the web when there is
time/inclination. I do this for several lists myself, and c4l may soon be
joining that group. The filter option is there. You can simply not read
things that don't interest you -- I probably only read about 5% of what I
receive and the job postings are not among the emails I read.

I'm trying to figure out why I'm even reading this thread let alone
participating, but that it exists at all intrigues me. Code4lib doesn't
exist as an organization and has no ability approve or disapprove anything.
This means that anyone who thinks a new list should exist can set it up.

kyle

Re: [CODE4LIB] Is it time to invite zoia to join the mailing list?

2014-05-08 Thread Kyle Banerjee

Aside the issue that giving specific individuals or bots preferential
treatment reverses progress made towards greater equality, I would be
concerned about the quality of participation by anyone who needs an
invitation to join.

Besides, it sends a message to other bots that didn't get an invitation
even though they lurk  here (such as googlebot) that they are less
important...
On May 7, 2014 10:58 PM, Simon Spero sesunc...@gmail.com wrote:

 ( In case the form doesn't get embedded
 https://docs.google.com/forms/d/1c2vlNveUs_VA4xeGXEC-
 c1ro5zaQ_dF73Pa1LzkBHQo/viewform?usp=send_form
 )

 I've invited you to fill out the form Robot Rights. To fill it out,
 visit:
 https://docs.google.com/forms/d/1c2vlNveUs_VA4xeGXEC-
 c1ro5zaQ_dF73Pa1LzkBHQo/viewform?sidc=0w=1tokenusp=mail_form_link

Re: [CODE4LIB] separate list for jobs

2014-05-06 Thread Kyle Banerjee

On Tue, May 6, 2014 at 9:59 AM, Richard Sarvas richard.sar...@lib.uconn.edu
 wrote:

 Not to be a jerk about this, but why is the answer always No? There seem
 to be more posts on this list relating to job openings than there are
 relating to code discussions. Are job postings a part why this list was
 originally created? If so, I'll stop now.


Fragmentation dilutes the community and creates an unnecessary barrier by
requiring people to know one more thing. Email filters take no time at all
to set up so anyone who considers them noise doesn't need to be exposed to
them.

kyle

Re: [CODE4LIB] barriers to open metadata?

2014-04-30 Thread Kyle Banerjee

Lack of demand, particularly since many catalogs contain a lot of garbage 
metadata and/or resources that others cannot access. Plus, the information goes 
stale quickly. Not that there's no use for this information, but not that many 
people are asking.

Also, despite declarations to wanting to make info open, library organizations 
are much better at giving away other peoples' information than their own. A 
huge percentage of librarians work at public expense, but if you do anything 
for ALA or a number of other library outfits, copyright notices and other 
restrictions competitive with the publishers we love to whine about get slapped 
on mighty fast. 

Kyle


 On Apr 29, 2014, at 1:02 PM, Laura Krier laura.kr...@gmail.com wrote:
 
 Hi Code4Libbers,
 
 I'd like to find out from as many people as are interested what barriers
 you feel exist right now to you releasing your library's bibliographic
 metadata openly. I'm curious about all kinds of barriers: technical,
 political, financial, cultural. Even if it seems obvious, I'd like to hear
 about it.
 
 Thanks in advance for your feedback! You can send it to me privately if
 you'd prefer.
 
 Laura
 
 -- 
 Laura Krier
 
 laurapants.comhttp://laurapants.com/?utm_source=email_sigutm_medium=emailutm_campaign=email

Re: [CODE4LIB] convert MODS XML into CSV or tab-delimted text

2014-04-22 Thread Kyle Banerjee

Given that you'll most likely have to deal with elements that are missing
and/or repeat variable amounts of times, conditional mappings, and data
that needs to be transformed, it may be easier to use a string parsing
routine to do what you need.

kyle


On Tue, Apr 22, 2014 at 11:35 AM, English, Eben eengl...@bpl.org wrote:

 Hello,

 Does anyone out there have an XSL stylesheet to transform MODS XML into
 a CSV or tab-delimited text file?

 Even if it's highly localized to your own institution/project, it would
 probably still be useful.

 Thanks in advance,

 Eben English
 Web Services Developer
 Boston Public Library
 700 Boylston St.
 Boston, MA 02116
 617.859.2238
 eengl...@bpl.org

Re: [CODE4LIB] distributed responsibility for web content

2014-04-18 Thread Kyle Banerjee

 While 'letting chaos reign' might seem the best solution, we've found that it 
 also presents unforeseen accessibility and general readability issues, e.g, 
 entire pages of bolded or inappropriately colored text, not to mention making 
 entire websites look like, well, crap!  

This is a serious issue.

Of course there are also plenty of CMSes that make it virtually impossible to 
present anything beyond what would have been eye candy in the 90's forcing 
units to outsource things they need to offsite vendors aren't that great but 
which can at least nominally provide a needed service. 

Kyle

1 2 3 4 >

1 - 100 of 327 matches

Mail list logo