[CODE4LIB] Registry blog post of interest ...
As some of you know, the RDA registrars have been working with the Deutsche Nationalbibliothek to enable a German language translation of the RDA elements and vocabularies to be available using the same mechanism as the English original. Today, Veronika Leibrecht, who's been working on this, added a new post to the Registry Blog (http://metadataregistry.org/blog) giving some information on how that process looks close up. A bit of a taste: /A prerequisite for the registering of our terms in the NSDL Registry and one of the greatest challenges for the German National Library at the moment is the translation of the RDA elements and vocabularies. Since bibliographic description is executed with a highly specialised vocabulary, we are finding that the process of finding the appropriate terms is interesting but also highly involved. Although the existing German rules for bibliographic description (RAK) and the authority files for subject headings (Schlagwortnormdatei, or SWD) have plenty of vocabulary to offer as equivalents to Anglo-American cataloguing terminology, RDA does include concepts relatively new to bibliographic description. / Do take a look a the post--comments and conversation welcome. Regards, Diane Hillmann
[CODE4LIB] Job Posting: Digital Archivist (UVa, Charlottesville, VA)
Hi All, The University of Virginia Library in Charlottesville, VA has just posted a new position for a Digital Archivist (http://bit.ly/Rhhws). This is a two-year, grant funded position by the Andrew Mellon Foundation to develop an inter-institutional model for stewardship for born-digital collection. Review of applications will begin November 2, 2009. If you have questions, please do not hesitate to contact Al Sapienza at ams...@virginia.edu = The University of Virginia Library seeks a talented and dynamic individual to serve as Digital Archivist to a two-year grant funded by the Andrew W. Mellon Foundation. This position will provide key leadership to a cohort of digital archivists from partner institutions (national and international) on this exciting initiative entitled: Born Digital Collections: An Inter-Institutional Model for Stewardship (AIMS). Reporting to the Director of Digital Curation Services, this position will provide the methodology and integration of archival practices to an ever-growing corpus of materials used by scholars, authors, and other notables: namely, born digital content. This is a collaborative project that will require the coordination of complex activities across several other institutions. The Digital Archivist will participate in the creation of a best practices manual for archivists and stewards of born-digital collections. This is an exciting opportunity to work at the crossroads! of special collections materials and new technologies. Qualifications: Required: Master's degree from an ALA-accredited program for library and information science and/or Master's degree in history or related discipline. Preferred: Candidates should have a broad understanding of archival and digital technology-related activities in an academic research library setting as well as knowledge of emerging trends in digital technologies and archival practice and where they might intersect. They should have demonstrated organizational skills in planning, prioritizing, and achieving goals in addition to excellent oral and written communication skills including presentation experience. Candidates should possess knowledge of digital archival and records management principles and practices, as well as the systems and automation techniques utilized which includes familiarity with EAD, MODS, METS, XML/XSL and other data structure standards relevant to the archival control of digital collection materials. They should also have the demonstrated ability to work with databases, develop functional requirements and workflows for programmers building new content management applications. Candidates should posses! s professional archival or digital records management experience with demonstrated professional accomplishments. The ability to provide leadership and to work independently and collaboratively in a team environment is critical. Environment: The University of Virginia Library (http://www.lib.virginia.edu http://www.lib.virginia.edu ) is a leader in innovative customer service, the development of digital library initiatives and infrastructure, and is recognized for the strength and variety of its collections. The Library system consists of twelve libraries, with independent libraries for health sciences, law, and business. The libraries support 13,000 undergraduates, 6,500 graduate students and 1,600 teaching faculty. The University and the Library have a strong commitment to achieving diversity among faculty and staff. The Neoclassical buildings of founder Thomas Jefferson's Academical Village still serves as the center of the University's Grounds (http://www.virginia.edu/uvatours/slideshow/ http://www.virginia.edu/uvatours/slideshow/ ) and as a unique backdrop for teaching, learning, and research. Salary and Benefits: Competitive depending on qualifications. This position has Administrative and Professional faculty status with excellent benefits, including 22 days of vacation and TIAA/CREF and other retirement plans. Review of applications will begin on November 2nd, 2009 and the position will be open until filled. Applicants must apply through the University of Virginia online employment website at https://jobs.virginia.edu/ https://jobs.virginia.edu/ . Search by position number FP677, complete application, and attach cover letter and resume, with contact information for three current, professional references. For assistance with this process contact Library Human Resources at (434) 924-3081. The University of Virginia is an Equal Opportunity/Affirmative Action employer strongly committed to achieving excellence through cultural diversity. The University actively encourages applications and nominations from members of underrepresented groups.
[CODE4LIB] lingua::stem::snowball
Can someone help me use Lingua::Stem::Snowball more efficiently? I want to count the total number of times a word stem appears in a hash. Here is a short example: use strict; use Lingua::Stem::Snowball; my $idea = 'books'; my %words = ( 'books'= 5, 'library' = 6, 'librarianship' = 5, 'librarians'= 3, 'librarian' = 3, 'book' = 3, 'museums' = 2 ); my $stemmer = Lingua::Stem::Snowball-new( lang = 'en' ); my $idea_stem = $stemmer-stem( $idea ); print $idea ($idea_stem)\n; my $total = 0; foreach my $word ( keys %words ) { my $word_stem = $stemmer-stem( $word ); print \t$word ($word_stem)\n; if ( $idea_stem eq $word_stem ) { $total += $words{ $word } } } print $total\n; In the end, the value of $total equals 8. That is, more or less, what I expect, but how can I make the foreach loop more efficient? In reality, my application fills %words up as many as 150,000 keys. Moreover, $idea is really just a single element in an array of about 100 words. Doing the math, the if statement in my foreach loop will get executed as many as 1,500,000 times. To make matters even worse, I plan to run the whole program about 10,000 times. That is a whole lot of processing just to count words! Is there someway I could short-circuit the foreach loop? I saw Lingua::Stem::Snowball's stem_in_place method, but to use it I must pass it an array disassociating my keys from their values. Second, is there a way I can make the stemming more aggressive? For example, I was hoping the stem of library would equal the stems of library, librarianship, and librarian, but alas, they don't. Any suggestions? -- Eric Lease Morgan
Re: [CODE4LIB] lingua::stem::snowball
It's been a while since I perled, so this might not be the most idiomatic solution, but you could stem the entire words has list once and create a hash of all the sums (%words_stems), then run the list of idea words (@ideas), checking only the desired stems: use strict; use Lingua::Stem::Snowball; my @ideas = ('books', 'otters', 'library'); my %words = ( 'books'= 5, 'library' = 6, 'librarianship' = 5, 'librarians'= 3, 'librarian' = 3, 'book' = 3, 'museums' = 2 ); my %words_stems = {}; my $stemmer = Lingua::Stem::Snowball-new( lang = 'en' ); foreach my $word (keys %words) { $words_stems{$stemmer-stem($word)} += $words{$word}; } foreach my $idea (@ideas) { my $idea_stem = $stemmer-stem( $idea ); print $idea ($idea_stem)\n; print $words_stems{$idea_stem}.\n; } The first foreach loop is executed once per word in %words, while the second foreach loop gets run once per item in @ideas. So 150,000 words with 1,000 ideas would only call the stem function (which is presumably where all the cost is) only 150,000 times. If you plan on doing something similar later, you could save that hash to disk, btw. Ben -- Benjamin Florin Technology Assistant for Blended Education Simmons College GSLIS 617-521-2842 benjamin.flo...@simmons.edu
Re: [CODE4LIB] lingua::stem::snowball
Presumably the call to stem() is the expensive part of your loop, so I'd want to cut that out if that is true. It looks to me that you can pass in an array reference to stem(), so there's no need for calling stem() in a loop at all. I'd think something like the code below should help reduce your calls to stem() to one call for the the idea and one call for the list of words. Note I used a sorted set of keys in order to assure that I keep the counts and the words that are stemmed in the same order when adding up the totals. The sort could be expensive too, so this may not work out better for you, depending on your input data and the performance of sort() and stem(). You could also use stem_in_place() if you don't want to make a copy of the array. Changing to use an array of @ideas instead of the scalar $idea would use an analogous technique. Matt use strict; use Lingua::Stem::Snowball; my $idea = 'books'; my %words = ( 'books'= 5, 'library' = 6, 'librarianship' = 5, 'librarians'= 3, 'librarian' = 3, 'book' = 3, 'museums' = 2 ); my $stemmer = Lingua::Stem::Snowball-new( lang = 'en' ); my $idea_stem = $stemmer-stem( $idea ); print $idea ($idea_stem)\n; my @wordkeys = sort(keys(%words)); my @stemwords = $stemmer-stem( \...@wordkeys ); my $i = 0; my $total = 0; foreach my $word (@wordkeys) { if ( $idea_stem eq $stemwords[$i] ) { $total += $words{ $word } } $i++; } print $total\n;