Re: [CODE4LIB] Mass Unsubscription Update

2016-09-19 Thread Andromeda Yelton
Thanks for your quick work on this issue, and the informative updates.

On Mon, Sep 19, 2016 at 8:24 AM, Wayne Graham <wgra...@clir.org> wrote:

> Hi Everyone,
>
> I wanted to let folks know I _believe_ the issue that led to a bunch of
> folks using Gmail (and custom Google domains) getting unsubscribed from
> email lists on our servers last week has been resolved. I’ve been watching
> the email logs (and working with individuals) to make sure things have
> calmed down, but if you hear anyone having a problem, please don’t hesitate
> to get in touch (slack, twitter, email, whatever).
>
> I also wrote up a brief explanation of what happened for those who are
> interested (https://www.diglib.org/archives/12638/).
>
> Best,
> Wayne
>



-- 
Andromeda Yelton
Vice President/President-Elect, Library & Information Technology
Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


[CODE4LIB] Job: Senior Systems Administrator, MIT Libraries

2017-07-08 Thread Andromeda Yelton
Do you enjoy making bulletproof server infrastructure? Come be my coworker
at MIT:
https://libraries.mit.edu/wp-content/uploads/2013/02/SVACSeniorSystemAdminREV.pdf

I'm happy to chat offlist.

-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] Governance for Code4Lib (was: What's so bad about bylaws?)

2017-07-24 Thread Andromeda Yelton
ter/code_of_
> > conduct.md <https://github.com/code4lib/antiharassment-policy/blob/
> > master/code_of_conduct.md>>
> > > github.com <http://github.com/>
> > > antiharassment-policy - Code4lib anti-harassment policy drafting space
> > >
> > > [https://avatars2.githubusercontent.com/u/1158447?v=3=400 <
> > https://avatars2.githubusercontent.com/u/1158447?v=3=400>]<https://
> > github.com/code4lib/antiharassment-policy/blob/master/code_of_conduct.md
> <
> > https://github.com/code4lib/antiharassment-policy/blob/
> > master/code_of_conduct.md>>
> > >
> > > antiharassment-policy/code_of_conduct.md at master ...<
> > https://github.com/code4lib/antiharassment-policy/blob/master/code_of_
> > conduct.md <https://github.com/code4lib/antiharassment-policy/blob/
> > master/code_of_conduct.md>>
> > > github.com <http://github.com/>
> > > antiharassment-policy - Code4lib anti-harassment policy drafting space
> > >
> > >
> > >
> > > It may be useful to further document Code4lib's consensus-based
> > procedures and policies for the benefit of legal entities that need to
> work
> > with us, but a formal governance structure for the community (as opposed
> to
> > that of an asset trustee) is something that I don't think the community
> > needs or wants.
> > >
> > > Also, I think the notion that we're indebted to "dumb luck" forgets
> that
> > "luck" is created by a lot of hard work.
> > >
> > > Eric
> > >
> > >
> > >> On Jul 24, 2017, at 11:01 AM, EDWIN VINCENT SPERR <esp...@uga.edu>
> > wrote:
> > >>
> > >> It is true that the Community has held 12 annual conferences without
> > formalization. And yes, it is likely *possible* to continue with the
> > current model of every conference being essentially a separate entity,
> and
> > support from the larger community being on an ad-hoc basis. But the
> reason
> > we are having this discussion is that this is not a particularly good
> > option -- it depends not only on good will, but (as Coral has noted) dumb
> > luck as well. It also means more stress and effort on the part of each
> > year's organizers than necessary.
> > >>
> > >> However, if we *do* form a relationship with another entity (or
> > self-incorporate), some person or persons will sign an agreement that
> binds
> > us, however you define "us", to a course of action that will likely span
> > several conferences. This is indeed a significantly different type of
> > decision than has come before, and it requires a different way of doing
> > business. Everybody has had a bad experience or two with bureaucracy, but
> > the current approach of trying to maintain Code4Lib as an amorphous
> entity
> > with no systematic way of arriving at a decision or definable point of
> > contact has real and tangible drawbacks.
> > >>
> > >> So, in the spirit of the current way of doing things, I propose the
> > formation of an ad-hoc, self-nominated committee (perhaps the last of its
> > kind) to investigate a formal governance structure for Code4Lib and then
> > assist the Community with its implementation.
> > >>
> > >> If you're interested in joining me, please contact me off-list:
> > esp...@uga.edu
> > >>
> > >>
> > >>
> > >>> Date:Fri, 21 Jul 2017 16:35:13 -0400
> > >>> From:Adam Constabaris <adam_constaba...@ncsu.edu>
> > >>> Subject: Re: What's so bad about bylaws?
> > >>
> > >>> It's an interesting question, but code4lib -- whatever exactly that
> is
> > --
> > >>> has managed to make all sorts of decisions, about where to hold
> > >>> conferences, keynote speakers, etc. for over a decade without
> > formalizing.
> > >>
> > >>> I am unclear on the exact details, but there is some carryover of
> > >>> conference funds from year to year and if I had to guess -- and this
> > is a
> > >>> guess -- it relies on the good will of the previous year's fiscal
> > sponsor(s)
> > >>> transferring the funds to the upcoming year's fiscal sponsor(s).
> > However
> > >>> exactly that process works, it's happened multiple times at the
> > direction
> > >>> of the community; each time, though, different parties are involved.
> > >>
> > >>> The F*C*IG is attempting to address (among other things) the
> > tenuousness of
> > >>> that arrangement, and they've identified a number of proposals that
> > appear
> > >>> to yield enough formal organization to ensure continuity.   The
> > >>> decision doesn't strike me as more momentous or different in kind
> from
> > the ones code4lib has
> > >>> made in the past, and shouldn't require any new mechanisms.
> > >>
> > >> Ed Sperr
> > >> Clinical Information Librarian
> > >> AU/UGA Medical Partnership
> > >> Athens, GA
> > >> esp...@uga.edu | esp...@stmarysathens.org
> >
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] accessing a python compressed sparse row format object

2017-09-26 Thread Andromeda Yelton
Do you have a link to the code you're using?

On Tue, Sep 26, 2017 at 1:25 PM, Eric Lease Morgan <emor...@nd.edu> wrote:

> Does anybody here know how to access a Python compressed sparse row format
> (CSR) object? [1]
>
> I am using Python to do a bit of topic modeling (think “classification”),
> and so far, the results are more than plausible, but the results only
> return topics not documents corresponding to the topics. Along the way, my
> script creates a compressed sparse row format object, and it looks
> something like this:
>
>   (0, 16099)0.055924002143
>   (0, 9497) 0.0256051292226
>   (0, 16202)0.140746540109
>   (0, 38982)0.000842900625312
>   : :
>   (309, 40805)  0.0435077792741
>   (309, 45679)  0.0435077792741
>   (309, 19462)  0.0435077792741
>   (309, 8346)   0.0435077792741
>   (309, 31204)  0.0435077792741
>
> Where the first column denotes a document identifier, the second column
> denotes a topic identifier, and the third column denotes the score of the
> topic in the document. In the example above, document #0 is a lot about
> topic #16202 but not a lot about topic #38982.
>
> I want to query my CSR object. For example, given a topic identifier (ie.
> 48692), return a list of all document identifiers and scores from the
> object. I will then sort the scores to find which documents which most
> significantly use the given topic.
>
> I can’t for the life of me figure out how to get what I need. I can get
> specific values of rows like this where tfidf is my CRS object:
>
>   >>> print( tfidf[ 309, 31204 ] )
>   >>> 0.0435077792741
>
> Any help would be greatly appreciated.
>
> [1] CSR - http://bit.ly/2fPj42V
>
> —
> Eric Morgan
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


[CODE4LIB] Tokenizers for scientific corpora?

2017-11-22 Thread Andromeda Yelton
I'm doing a project to prototype machine-learning-driven interfaces to
MIT's thesis collection, and my preprocessing step would really benefit
from a tokenizer that is aware of common multi-word scientific tokens (e.g.
"inertial mass" should definitely be one token, not two).

My somewhat cursory research didn't turn any up, and a conversation in
code4lib slack just now shows I'm not the only one with this problem...does
anyone have anything handy to suggest? Thanks.

-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] Comparing Barcodes Between 2 Files?

2017-11-23 Thread Andromeda Yelton
I definitely agree with Hannah's suggestion for data in Excel, but if you
have this kind of problem in future and your data is already in Python, you
can use set intersections.

set1 & set2 gives you a set of the elements common to both sets.

If your data are in lists, no problem - set(a_list) will turn a_list into a
set.

So set([your, first, list]) & set([your, second, list])

Sets, unlike lists, are unordered and don't allow duplicates. I tend to
find this is exactly what I am looking for if I am turning a list into a
set (in fact set(a_list) is a very handy way to remove duplicates), but
fair warning, in case this is not what you want.

https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset

On Mon, Nov 13, 2017 at 12:15 PM, Kyle Breneman <tomeconque...@gmail.com>
wrote:

> Thanks, everyone, for taking the time to reply to my question!  I liked the
> simplicity of Kyle Bannerjee's command line suggestion, but stumbled over
> how to get the "cat" command to work on my Windows machine and couldn't
> figure out how to properly implement GnuWin32, so I went with a variation
> of Hannah's solution, using this formula in Excel:
>
> =IF(COUNTIF($C:$C, $A2)=0, "No", "Match!")
>
> On Fri, Nov 10, 2017 at 4:16 PM, Kyle Breneman <tomeconque...@gmail.com>
> wrote:
>
> > I have 2 Excel files, each with a column of barcodes.  I am supposed to
> > determine which, if any, of the barcodes in the first file are also
> present
> > in the second file.  Is writing a short Python program the best way to do
> > this, or is there a more efficient way?  (There are about 300 items in
> the
> > first file and about 1,000 items in the second file.)
> >
> > Regards,
> > Kyle
> >
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] Help with parsing dates?

2017-11-03 Thread Andromeda Yelton
In re Github facets - some projects do use labels to specifically indicate
issues that may be beginner-friendly. The specific terms they use vary but
you can find a searchable/filterable aggregation at http://up-for-grabs.net/
. Also OpenHatch is a friendly community aimed at getting people involved,
and they have a place you can search for issues that might be relevant
(e.g. https://openhatch.org/search/?language=Python= ).

On Fri, Nov 3, 2017 at 9:29 AM, Julie Swierczek <jswie...@swarthmore.edu>
wrote:

> Dan,
>
> This. Is. Awesome. And your interpretation of the date string "1933,
> 1937-1938, 1941" is correct - I meant to say it should be 1933/1941. This
> sort of error is exactly why I wanted to approach this programmatically,
> and not type the dates by hand.  I used student employees to copy the data
> from the HTML pages into spreadsheets, and to check for spelling errors.
> However, I didn't want to use students to type the dates. I feel like that
> would be risking the creation of too much metacrap.  I can't even type them
> correctly myself, so I can't expect students to have 100% accuracy, either.
>
> Also, for anyone else following from home, I have to say why I love this
> solution compared to all the others.
>
> 1) I have over 400 spreadsheets, some with over 1000 lines. While I
> *could* use OpenRefine or Excel for a certain amount of date cleaning, that
> assumes I am interested in - and have the time  for - opening each file
> individually and working on the dates one spreadsheet at a time. I can set
> this script up to run through a bunch of csv files. I don't need to look at
> them.  (And, yes, I know how to set up a task in OpenRefine and save it and
> use it again later - and I was working on building one of those - but that
> is more time consuming than I want this task to be.)
>
> 2) This doesn't' use Ruby or perl or other tools that I don't know and
> don't have time to use now. I said I can handle basic Python, and that's
> what this is.
>
> 3) This is written simply and clearly, and doesn't do too much of 'let's
> prove how awesome I am by using as few lines of code as possible', which is
> really hard for newbies to interpret and change.   (You know what I'm
> talking about - something that a newbie would write in 200 lines and
> someone else says, "Yeah, you idiot, I can do that in two lines". Cf. ALL
> OF STACK OVERFLOW.)
>
> 4) Building on point number 3, this is written simply and clearly enough
> that I can figure out how to modify it further if I come across any other
> date cases that I haven't discovered so far.  I would even feel confident
> enough to submit a pull request if I do develop solutions for other date
> formats for this.
>
> 5) Further, this is written simply and clearly enough that I can use this
> as a model for figuring out how to write other Python stuff to handle other
> similar tasks.  This is now my favorite thing in all of GitHub. (I wish
> GitHub had a special facet for 'newbie friendly' stuff.  I know that is
> somewhat subjective, but I can't tell you how many 'easy' tools that have
> been recommended to me that would take me roughly a week to figure out how
> to run once, and possibly another month of trying to troubleshoot error
> messages to get it to actually work. Cf. http://tpverso.com/an-open-
> letter-to-open-source-projects-for-lams/)
>
> I again want to thank Dan for this code and I also want to commend it to
> everyone else's attention as the sort of code that is really friendly to
> newbies. If you are thinking of writing a tool and you want to be able to
> share it with institutions of all sizes, with a really low barrier to entry
> (e.g., the knowledge of how to put a .py file in a directory, change the
> filename in the .py file, and then run 'python test.py'), then this is a
> good model of how code should be written. Also, while I am on my soapbox,
> here's a great model for documentation: https://github.com/
> CarletonArchives/BagBatch.
>
> Thus Endeth the Lecture.
>
> Dan, thanks again. This just made my semester.
>
> Julie Swierczek
> Transformer of Dates
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] clustering techniques for normalizing bibliographic data

2017-10-25 Thread Andromeda Yelton
It turns out it's straightforward to reimplement the default fingerprinting
algorithm that OpenRefine uses. We did that here to help catch those sorts
of trivial spelling differences in user searches in order to provide
best-bet suggestions for some of our most popular stuff. Here's my
reimplementation; have fun:
https://github.com/MITLibraries/bento/blob/ec3901657a8548f9c16c8d31a866d7239aa29f0c/app/models/hint.rb#L48

Once you have a cluster of strings with a common fingerprint, you'd need to
pick a canonical form for everything in that cluster, since the fingerprint
itself isn't a thing you'd want to expose to humans.

On Wed, Oct 25, 2017 at 11:57 AM, Eric Lease Morgan <emor...@nd.edu> wrote:

> Has anybody here played with any clustering techniques for normalizing
> bibliographic data?
>
> My bibliographic data is fraught with inconsistencies. For example, a
> publisher’s name may be recorded one way, another way, or a third way. The
> same goes for things like publisher place: South Bend; South Bend, IN;
> South Bend, Ind. And then there is the ISBD punctuation that is sometimes
> applied and sometimes not. All of these inconsistencies make indexing &
> faceted browsing more difficult than it needs to be.
>
> OpenRefine is a really good program for finding these inconsistencies and
> then normalizing them. OpenRefine calls this process “clustering”, and it
> points to a nice page describing the various clustering processes. [1] Some
> of the techniques included “fingerprinting” and calculating “nearest
> neighbors”. Unfortunately, OpenRefine is not really programable, and I’d
> like to automate much of this process.
>
> Does anybody here have any experience automating the process of normalize
> bibliographic (MARC) data?
>
> [1] about clustering - http://bit.ly/2izQarE
>
> —
> Eric Morgan
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] MARC Holdings

2018-01-18 Thread Andromeda Yelton
Note that if perl isn't your thing there are MARC libraries in several
languages - python and ruby at least, probably others I don't remember off
the top of my head (since I work in python and ruby, no shade to other
people's languages :). https://github.com/edsu/pymarc ,
https://github.com/ruby-marc/ruby-marc .

On Thu, Jan 18, 2018 at 12:50 PM, Julie Cole <jc...@langara.ca> wrote:

> Hello all,
> I'm pretty new to the world of library systems and this is my first post.
>
> Anyone have any experience parsing MARC Holding records (853 and 863) into
> a more readable 866 or 867 format?
> We are wanting to export our holdings from our ILS into our Discovery
> Layer and trying to save some of the money that the ILS vendor would charge
> us to create the records.
>
> The parsing doesn't look fun, so I was hoping someone has some code to use
> as a starting point.
> Also, I'm not sure how clean our data in 853 and 863 is so any scripts or
> advice on gotchas when cleaning that up would be appreciated.
> We have about 60,000 holding records.
>
> Thanks,
> Julie.
>
>
> Julie Cole
> Library Systems Administrator
> Langara College Library
> Vancouver, BC
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] Finalized Duty Officer List for Code4Lib 2018

2018-01-29 Thread Andromeda Yelton
Thank you to everyone who's willing to take this on! You're super great and
you make c4l a better place.

On Thu, Jan 25, 2018 at 5:25 PM, Becky Yoose <b.yo...@gmail.com> wrote:

> Hello everyone,
>
> Apologies for the delay! We have a finalized list of Duty Officers for the
> Code4Lib 2018 conference:
>
> On-site Duty Officers:
>
>- Shaun Ellis
>- Mark Matienzo
>- Galen Charlton
>- Christie Peterson
>- Chad Nelson
>- Bobbi Fox
>- Bethany Nowviskie
>- Becky Yoose
>
> Online Duty Officers:
>
>- Becca Quon
>- Jill Locascio
>- Katherine Kim
>- Josh Hutchinson
>- Karen Coyle
>
> More information about how to contact Duty Officers during the conference,
> as well as Duty Officer schedules, will be posted to the conference site
> shortly. Stay tuned...
>
> Thank you,
> Becky
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


[CODE4LIB] Berkman Klein call for 2019 fellows

2018-10-31 Thread Andromeda Yelton
Do you do scholarship, activism, software development, law, or something
else awesome in the internet-and-society space? Would you like to carry on
that work in the midst of a multidisciplinary cohort of brilliant people?
MPOW has opened its call for 2019-2020 fellows:

https://cyber.harvard.edu/getinvolved/fellowships/1920Fellows

Please have a look if this sounds like your sort of thing. I'm not involved
with the selection process, but I'm happy to answer questions about Berkman
Klein and Cambridge, MA. There are also several fellows, affiliates, and
alumni in the library world, whom I invite to chime in :)

Note that fellows are expected to spend a fair fraction of their time
physically present in Cambridge but affiliates can be more loosely coupled,
so please think about this whether or not you're in the area.

-- 
Andromeda Yelton
Web Applications Developer, Berkman Klein Center: https://cyber.harvard.edu
Past President, Library & Information Technology Association:
http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] From the Community Support Squad wrt "Note [admiistratativia]"

2019-07-12 Thread Andromeda Yelton
Please keep in mind that not everyone has an institutional affiliation, and
even people who do might prefer to use a non-institutional address so that
the address they use to interface with code4lib remains stable even as
their institutional affiliation changes, especially if they're in a term
appointment or temporary/contingent labor situation where they cannot
expect long tenure (regrettably common).

Personally I'm also quite content with consistent pseudonyms, but I'm happy
to defer to the community on that one if there's a consensus to the
contrary. Requiring institutional affiliation has larger philosophical and
practical challenges and I would not be comfortable with that.

On Fri, Jul 12, 2019 at 11:07 AM Eric Lease Morgan  wrote:

> On Jul 11, 2019, at 4:09 PM, Kate Deibel <
> 001fd0f2bb98-dmarc-requ...@lists.clir.org> wrote:
>
> > For people who lack either github or git knowledge and don't want to
> just try to read the diff outputs, here are the links you need...
> >
> > --
> > Katherine Deibel | PhD
> > Inclusion & Accessibility Librarian
> > Syracuse University Libraries
> > T 315.443.7178
> > kndei...@syr.edu
> > 222 Waverly Ave., Syracuse, NY 13244
> > Syracuse University
>
>
> One thing I hope to see in the revision/update to our codes-of-conduct is
> in regards to signatures; personally, I think each posting to the mailing
> list ought to be non-anonymous.
>
> With the advent of some sort of new SMTP enhancement called DMARC, it is
> possible to post to LISTSERV applications (like ours) and have your email
> address obfuscated, like above. This is apparently a feature. [0] Yes,
> direct replies to an address like
> 001fd0f2bb98-dmarc-requ...@lists.clir.org do make it back to the
> original sender, but without some sort of signature can be very difficult
> to know to whom one is replying.
>
> I think any poster to the mailing ought to be easily identifiable. One
> ought to be able to easily know the name of the poster, their affiliation,
> and their email address. Such makes things: 1) more transparent, and 2)
> lends credibility to the post. Even if I don't sign this message you can
> see that my name is Eric Morgan, I work for Notre Dame, and my address is
> emor...@nd.edu. The posting above works because there is/was a full
> signature. Postings from firstname_lastn...@gmail.com are difficult to
> swallow but I can live with them. But postings from EM <
> 001fd0f2bb98-dmarc-requ...@lists.clir.org> with no signature I think
> are not respectful. Remember, "On the Internet, nobody knows you are a
> dog." [1]
>
> [0] dmarc - https://www.lsoft.com/news/dmarc-issue1-2018.asp
> [1] dog -
> https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you're_a_dog
>
> --
> Eric Morgan
> University of Notre Dame
>
> 574/631-8604
>


-- 
Andromeda Yelton
Web Applications Developer, Berkman Klein Center: https://cyber.harvard.edu
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] Photos - video - music - Oh My!

2020-02-02 Thread Andromeda Yelton
I'm a big unsplash fan (and use it almost exclusively for slide decks now)
-- high-quality photos, extremely permissive license.

On Sat, Feb 1, 2020 at 2:12 PM Brent Ferguson 
wrote:

> Hi,
>
> I also love pixabay, just discovered https://unsplash.com/ and really
> like it.
>
> For video I search for videos that are licensed under creative commons or
> are royalty free stock videos.
>
> Also use flickr.com and images.google.com, and once again search for
> creative commons or royalty free. etc.
>
> Brent Ferguson, MLS
> Librarian, Elkhart Public Library
>
>
> 
> From: Code for Libraries [CODE4LIB@LISTS.CLIR.ORG] on behalf of charles
> meyer [reachmepl...@gmail.com]
> Sent: Saturday, February 01, 2020 1:55 PM
> To: CODE4LIB@LISTS.CLIR.ORG
> Subject: [CODE4LIB] Photos - video - music - Oh My!
>
> Hi my esteemed listmates,
>
>
> I’m curious which sites you’ve found helpful when you need to use photos,
> videos, music, etc. for library events?
>
>
>
> For photos, pixabay has been pretty good.
>
>
>
> Are the other sites you’ve enjoyed?
>
>
>
> Thank you!
>


-- 
Andromeda Yelton
Web Applications Developer, Berkman Klein Center: https://cyber.harvard.edu
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>


Re: [CODE4LIB] modeling data and metadata for repository ingest

2020-11-11 Thread Andromeda Yelton
I think you will be happiest in the long run if Tree exposes an interface
that is the same as other interfaces you are familiar with, and it is
entirely reasonable for a Node object to 1) exist and 2) know its own path.
Also I think a "copy" method should only copy, not "copy and instantiate"
(if a function is most accurately described with a phrase containing 'and',
it wants to be at least two functions). Keeping its responsibilities small
will make it easier to write, test, and maintain.

There's something pulling at my brain about this class structure that I
can't quite identify without seeing the data, but it is something about the
name and responsibilities of Tree. Knowing how to copy is treelike. But
knowing how to deal with specific metadata types is possibly more Nodelike?

You say there are lots of possible input types and output types -- what
does the part between them look like? Does everything go through some sort
of common state? If so, it would make sense for a Node to know how to
transform between its content type and that common form, and for Trees to
deal only with the common form. Admittedly I cannot imagine what that
common form would look like. But otherwise you're writing a fully-connected
graph of transforms between everything and everything and you will be
extremely sad as this graph grows.

Anyway. I'm not quite sure where I'm going with this, without having the
code in front of me. But I think it's worth being very explicit with
yourself about what you expect the responsibilities of each class to be,
because then you can look at whether those responsibilities make sense,
whether the class names correctly describe those sets of responsibilities,
and what interfaces you need to expose to make it harmonize.

On Tue, Nov 10, 2020 at 4:34 PM McDonald, Stephen 
wrote:

> Fellow library code wranglers,
>
> Coding questions don't come up often here, but I think this might be the
> best group to ask, as my question somewhat involves both coding and the
> nature of metadata and data.  A considerable amount of my work involves
> ingesting materials into our institutional repository.  We get this
> material from many sources in many formats; PDF, Quicktime, WAV, etc., with
> metadata in XML, MARC21, or even spreadsheets.  It might be organized as
> filesystem directories, zip files, or images with imbedded metadata.
> Before loading into the repository, the metadata must be extracted and
> transformed, and the data files reorganized for convenient ingest.
>
> To make this easier, we have written a toolkit (in Ruby) which handles the
> conversion.  You select the source type (e.g. zipfile of electronic theses
> from Proquest), specify the directory/zipfile/whatever containing the data,
> and the toolkit executes all the transforms and organizes into a convenient
> directory structure, ready to ingest into the repository.  The problem is
> that the code in the toolkit is clunky, making it difficult to add new
> sources and the needed transformations.
>
> I am rewriting the toolkit from scratch, with a modular design.  I want a
> consistent set of methods defined in an abstract class for a package of
> data (which I am calling a Tree), with subclasses defining the exact
> behavior of the methods for directories, zipfiles, images with imbedded
> metadata, etc.  I'm sure this is familiar to some of you.  A file or
> directory (or analog) within a Tree is defined as a path from the root of
> the Tree
>
> The question I have is the best model to use for the arguments of the
> methods of this class.  For instance, I want an analog to the copy method,
> to copy a file from the input Tree to the new ingest Tree.  The ruby
> filesystem copy method is .cp(src, dest).  An analog method would have to
> specify the input Tree along with the input path, and the output Tree plus
> the output path.  So I could define the method as Tree.cp(srctree, srcpath,
> desttree, destpath).  Or I could go a little more abstract and define a
> class Node which is a combination of a Tree and a path.  Then I could
> create Tree.cp(srcnode, destnode), which looks more like the familiar
> filesystem methods.
>
> Does anyone have an opinion on which would be better?  Using Nodes looks a
> lot cleaner and appeals to my sense of organization.  I will be defining a
> Tree.glob method, so that should handle instantiating source Nodes, but
> output Nodes would need to be instantiated.  The first method avoids the
> complication of instantiating Nodes before using them in copy and move
> commands.  I'm not sure which would be easier for writing specific ingest
> routines for a new data source, since someday someone else will have to
> write them.  Any thoughts?
>
>
>   Steve McDonald
>
>   steve.mcdon...@tufts.edu<mailto:steve.mcdon...@tuf

Re: [CODE4LIB] Code4Lib jobs list data dump?

2021-01-22 Thread Andromeda Yelton
The initial commit in https://github.com/code4lib/shortimer/ was November
2011, which is ten years for some values of ten. Taking a quick and
noncomprehensive glance around, I see postings as old as 2005. I don't see
an obvious API, but maybe a maintainer could weigh in about data dump
possibilities?

On Fri, Jan 22, 2021 at 11:28 AM Eric Lease Morgan  wrote:

> On Jan 22, 2021, at 11:11 AM, Jill Ellern  wrote:
>
> > I'm doing some research into systems librarian duties and wondering if
> there is an easy way to get a dump of the code4lib jobs from the last 10
> years?  In excel format?
>
>
> Easy? I'd be surprised.
>
> There are two or three sources of the Code4Lib jobs data:
>
>   1. the underlying data from the jobs.code4lib.org site
>
>   2. any one of a number of different Code4Lib mailing list Web archives
>
>   3. the archived mailbox (mbox) files from the mailing list
>
> I don't think the jobs site has been around for ten years. Has it? Nor do
> I know whether or not the data is archived. If it is, then I'd bet you will
> be able get it in some sort of structured format like JSON or delimited
> delimited format like Excel.
>
> Scraping different Web archives would require... scraping which,
> personally, I run away from.
>
> Finally, the archived mbox files would be the most comprehensive, but a
> programmer would have to parse the mbox (email) files, which is a
> specialized task in and of itself. If you want to know where the mbox files
> are located, then drop me a line and I'll let you know. Easy.
>
> Finally, what's the questions you would like to answer? How many system
> librarian jobs have been posted? Where were the jobs? What are the
> characteristics of systems librarianship and how have they changed over
> time? How much they pay? Extracting some of this information from the
> postings may be difficult, if not heroic in nature.
>
> --
> Eric Morgan
> University of Notre Dame



-- 
Andromeda Yelton
Humanistic Machine Learning for Library Data
Lecturer, San José State University iSchool
https://andromedayelton.com
@ThatAndromeda
<http://twitter.com/ThatAndromeda>


[CODE4LIB] Call for volunteers: Keynote Committee

2023-09-18 Thread Andromeda Yelton
Have you enjoyed previous years’ keynotes? Want to make sure this year’s
are awesome too? Come join me on the Keynote Committee! We’ll be starting
work soon and wrapping up by January, so this is a great option if you want
to contribute to code4lib but spring is a hard time for you to volunteer.
We still need a cochair (I’m the chair) and a documentarian; both of these
should be fairly light lifts as the documentation from last year is in good
order and we’ve already got a timeline and tasks planned out.

I’m the chair, so feel free to email me or message me in slack
(@thatandromeda) if you have questions, or just go ahead and add yourself on
the wiki
<https://wiki.code4lib.org/Code4Lib_2024_Conference_Committees#Keynote_Committee>
.
-- 
Andromeda Yelton
Senior Software Engineer, JSTOR Labs
Lecturer, San José State University iSchool
https://andromedayelton.com

@thatandromeda (Mastodon <https://ohai.social/@thatandromeda>, Bluesky
<https://bsky.app/profile/thatandromeda.bsky.social>, github
<https://github.com/thatandromeda>)


[CODE4LIB] Call for Keynote Nominations - Code4Lib 2024

2023-10-17 Thread Andromeda Yelton
Code4Lib 2024 is a loosely-structured conference that provides people
working at the intersection of libraries/archives/museums/cultural heritage
and technology with a chance to share ideas, be inspired, and forge
collaborations. The conference will be held May 13-16, 2024 at the
University of Michigan in Ann Arbor. You can find more information about
the conference on the conference web site (https://2024.code4lib.org/) and
more information about the Code4Lib community at https://code4lib.org/about/
.

Keynote speaker nominations for the Code4Lib 2024 Conference will be
accepted from now through Nov. 16. Nominations should be made on our
keynote nominations page (
https://wiki.code4lib.org/2024_Keynote_Speakers_Nominations) on the
Code4Lib wiki. To get a wiki account, please email Ryan Wick at
ryanw...@gmail.com; if you wish to make a nomination without a wiki
account, please email Mike Taylor at mike.tay...@nau.edu.

When making a nomination, please consider whether the nominee is likely to
be an excellent contributor in each of the following areas:

1) **Appropriateness**. Is this speaker likely to convey information that
is useful to many members of our community?

2) **Uniqueness**. Is this speaker likely to cover themes that may not
commonly appear in the rest of the program?

3) **Contribution to diversity**. Will this person bring something rare,
notable, or unique to our community, through unusual experience or
background?

Please include the following information in your nomination:

- Speaker’s full name

- Brief description of individual (250-word max)

- Pertinent links (Maximum of 3)

- Contact information for candidate (email address)

The Keynote Committee will attempt to contact all nominees and will only
include on the ballot those who consent to be nominated.

Andromeda Yelton,

On behalf of the Code4Lib 2024 Keynote Committee

** If you would prefer to submit a nomination anonymously, please send your
nominee(s) to Mike Taylor at mike.tay...@nau.edu.

-- 
Andromeda Yelton
Senior Software Engineer, JSTOR Labs
Lecturer, San José State University iSchool
https://andromedayelton.com

@thatandromeda (Mastodon <https://ohai.social/@thatandromeda>, Bluesky
<https://bsky.app/profile/thatandromeda.bsky.social>, github
<https://github.com/thatandromeda>)


[CODE4LIB] Second Call for Keynote Nominations - Code4Lib 2024

2023-11-08 Thread Andromeda Yelton
Reminder: Keynote speaker nominations for the Code4Lib 2024 Conference are
still being accepted through Nov. 16. Code4Lib 2024 is a loosely-structured
conference that provides people working at the intersection of
libraries/archives/museums/cultural heritage and technology with a chance
to share ideas, be inspired, and forge collaborations. The conference will
be held May 13-16, 2024 at the University of Michigan in Ann Arbor. You can
find more information about the conference on the conference web site (
https://2024.code4lib.org/) and more information about the Code4Lib
community at https://code4lib.org/about/.

Nominations should be made on our keynote nominations page (
https://wiki.code4lib.org/2024_Keynote_Speakers_Nominations) on the
Code4Lib wiki. To get a wiki account, please email Ryan Wick at
ryanw...@gmail.com; if you wish to make a nomination without a wiki
account, please email Mike Taylor at mike.tay...@nau.edu.

When making a nomination, please consider whether the nominee is likely to
be an excellent contributor in each of the following areas:

1) *Appropriateness*. Is this speaker likely to convey information that is
useful to many members of our community?

2) *Uniqueness*. Is this speaker likely to cover themes that may not
commonly appear in the rest of the program?

3) *Contribution to diversity*. Will this person bring something rare,
notable, or unique to our community, through unusual experience or
background?

Please include the following information in your nomination:

   -

   Speaker’s full name
   -

   Brief description of individual (250-word max)
   -

   Pertinent links (Maximum of 3)
   -

   Contact information for candidate (email address)

The Keynote Committee will attempt to contact all nominees and will only
include on the ballot those who consent to be nominated.

Andromeda Yelton,

On behalf of the Code4Lib 2024 Keynote Committee

** If you would prefer to submit a nomination anonymously, please send your
nominee(s) to Mike Taylor at mike.tay...@nau.edu.

-- 
Andromeda Yelton
Senior Software Engineer, JSTOR Labs
https://andromedayelton.com

@thatandromeda (Mastodon <https://ohai.social/@thatandromeda>, Bluesky
<https://bsky.app/profile/thatandromeda.bsky.social>, github
<https://github.com/thatandromeda>)


[CODE4LIB] Reminder: Keynote voting is open for Code4Lib 2024

2023-12-13 Thread Andromeda Yelton
Reminder: Voting for Code4Lib 2024 Keynote speakers is ongoing and will
close in 1 week on December 20, 2023.

All nominees have been contacted and the 7 included in this election are
potentially available to speak.

When ranking nominees, please consider whether they are likely to be an
excellent contributor in each of the following areas:

1) *Appropriateness*. Is this speaker likely to convey information that is
useful to many members of our community?

2) *Uniqueness*. Is this speaker likely to cover themes that may not
commonly appear in the rest of the program?

3) *Contribution to diversity*. Will this person bring something rare,
notable, or unique to our community, through unusual experience or
background?

In order to limit the vote to one per individual, an email address linked
to a Google account is required. Access the form here:
https://forms.gle/PzHt72WQcP5TSEeN7

Andromeda Yelton,

On behalf of the Code4Lib 2024 Keynote Committee.
-- 
Andromeda Yelton
Senior Software Engineer, JSTOR Labs
https://andromedayelton.com

@thatandromeda (Mastodon <https://ohai.social/@thatandromeda>, Bluesky
<https://bsky.app/profile/thatandromeda.bsky.social>, github
<https://github.com/thatandromeda>)


[CODE4LIB] Keynote Speaker Voting - Code4Lib 2024

2023-12-04 Thread Andromeda Yelton
The Code4Lib 2024 Keynote Committee is happy to open this year’s invited
speaker election.

All nominees have been contacted and the 7 included in this election are
potentially available to speak. The top two available vote recipients will
be invited to be our keynote speakers this year; in the case of a tie, the
speaker closer to Ann Arbor will be invited. Voting will end in 16 days on
December 20.

When rating nominees, please consider whether they are likely to be an
excellent contributor in each of the following areas:

1) *Appropriateness*. Is this speaker likely to convey information that is
useful to many members of our community?

2) *Uniqueness*. Is this speaker likely to cover themes that may not
commonly appear in the rest of the program?

3) *Contribution to diversity*. Will this person bring something rare,
notable, or unique to our community, through unusual experience or
background?

In order to limit the vote to one per individual, an email address is
required. Access the form here: https://forms.gle/PzHt72WQcP5TSEeN7 .

Andromeda Yelton,

On behalf of the Code4Lib 2024 Keynote Committee.
-- 
Andromeda Yelton
Senior Software Engineer, JSTOR Labs
https://andromedayelton.com

@thatandromeda (Mastodon <https://ohai.social/@thatandromeda>, Bluesky
<https://bsky.app/profile/thatandromeda.bsky.social>, github
<https://github.com/thatandromeda>)


[CODE4LIB] Final Call for Keynote Nominations - Code4Lib 2024

2023-11-13 Thread Andromeda Yelton
Keynote speaker nominations for the Code4Lib 2024 Conference will close at
midnight on Nov. 16. Nominations can still be made on our keynote
nominations page (
https://wiki.code4lib.org/2024_Keynote_Speakers_Nominations) on the
Code4Lib wiki. To get a wiki account, please email Ryan Wick at
ryanw...@gmail.com; if you wish to make a nomination without a wiki
account, please email Mike Taylor at mike.tay...@nau.edu.

Code4Lib 2024 is a loosely-structured conference that provides people
working at the intersection of libraries/archives/museums/cultural heritage
and technology with a chance to share ideas, be inspired, and forge
collaborations. The conference will be held May 13-16, 2024 at the
University of Michigan in Ann Arbor. You can find more information about
the conference on the conference web site (https://2024.code4lib.org/) and
more information about the Code4Lib community at https://code4lib.org/about/
.

When making a nomination, please consider whether the nominee is likely to
be an excellent contributor in each of the following areas:

1) *Appropriateness*. Is this speaker likely to convey information that is
useful to many members of our community?

2) *Uniqueness*. Is this speaker likely to cover themes that may not
commonly appear in the rest of the program?

3) *Contribution to diversity*. Will this person bring something rare,
notable, or unique to our community, through unusual experience or
background?

Please include the following information in your nomination:

   -

   Speaker’s full name
   -

   Brief description of individual (250-word max)
   -

   Pertinent links (Maximum of 3)
   -

   Contact information for candidate (email address)

The Keynote Committee will attempt to contact all nominees and will only
include on the ballot those who consent to be nominated.

Andromeda Yelton,

On behalf of the Code4Lib 2024 Keynote Committee

** If you would prefer to submit a nomination anonymously, please send your
nominee(s) to Mike Taylor at mike.tay...@nau.edu.
-- 
Andromeda Yelton
Senior Software Engineer, JSTOR Labs
https://andromedayelton.com

@thatandromeda (Mastodon <https://ohai.social/@thatandromeda>, Bluesky
<https://bsky.app/profile/thatandromeda.bsky.social>, github
<https://github.com/thatandromeda>)