[CODE4LIB] Registry blog post of interest ...

2009-10-12 Thread Diane I. Hillmann

As some of you know, the RDA registrars have been working with the
Deutsche Nationalbibliothek to enable a German language translation of
the RDA elements and vocabularies to be available using the same
mechanism as the English original.  Today, Veronika Leibrecht, who's
been working on this, added a new post to the Registry Blog
(http://metadataregistry.org/blog) giving some information on how that
process looks close up.  A bit of a taste:

   /A prerequisite for the registering of our terms in the NSDL
   Registry and one of the greatest challenges for the German National
   Library at the moment is the translation of the RDA elements and
   vocabularies.  Since bibliographic description is executed with a
   highly specialised vocabulary, we are finding that the process of
   finding the appropriate terms is interesting but also highly
   involved. Although the existing German rules for bibliographic
   description (RAK) and the authority files for subject headings
   (Schlagwortnormdatei, or SWD) have plenty of vocabulary to offer as
   equivalents to Anglo-American cataloguing terminology, RDA does
   include concepts relatively new to bibliographic description.

   /

Do take a look a the post--comments and conversation welcome.

Regards,
Diane Hillmann


[CODE4LIB] Job Posting: Digital Archivist (UVa, Charlottesville, VA)

2009-10-12 Thread Graham, Wayne (wsg4w)
Hi All,

The University of Virginia Library in Charlottesville, VA has just posted a new 
position for a Digital Archivist (http://bit.ly/Rhhws). This is a two-year, 
grant funded position by the Andrew Mellon Foundation to develop an 
inter-institutional model for stewardship for born-digital collection.

Review of applications will begin November 2, 2009.


If you have questions, please do not hesitate to contact Al Sapienza at 
ams...@virginia.edu

=

The University of Virginia Library seeks a talented and dynamic individual to 
serve as Digital Archivist to a two-year grant funded by the Andrew W. Mellon 
Foundation. This position will provide key leadership to a cohort of digital 
archivists from partner institutions (national and international) on this 
exciting initiative entitled: Born Digital Collections: An Inter-Institutional 
Model for Stewardship (AIMS). Reporting to the Director of Digital Curation 
Services, this position will provide the methodology and integration of 
archival practices to an ever-growing corpus of materials used by scholars, 
authors, and other notables: namely, born digital content. This is a 
collaborative project that will require the coordination of complex activities 
across several other institutions. The Digital Archivist will participate in 
the creation of a best practices manual for archivists and stewards of 
born-digital collections. This is an exciting opportunity to work at the 
crossroads!
  of special collections materials and new technologies.
Qualifications: Required: Master's degree from an ALA-accredited program for 
library and information science and/or Master's degree in history or related 
discipline.

Preferred: Candidates should have a broad understanding of archival and digital 
technology-related activities in an academic research library setting as well 
as knowledge of emerging trends in digital technologies and archival practice 
and where they might intersect. They should have demonstrated organizational 
skills in planning, prioritizing, and achieving goals in addition to excellent 
oral and written communication skills including presentation experience. 
Candidates should possess knowledge of digital archival and records management 
principles and practices, as well as the systems and automation techniques 
utilized which includes familiarity with EAD, MODS, METS, XML/XSL and other 
data structure standards relevant to the archival control of digital collection 
materials. They should also have the demonstrated ability to work with 
databases, develop functional requirements and workflows for programmers 
building new content management applications. Candidates should posses!
 s professional archival or digital records management experience with 
demonstrated professional accomplishments. The ability to provide leadership 
and to work independently and collaboratively in a team environment is critical.

Environment:  The University of Virginia Library (http://www.lib.virginia.edu 
http://www.lib.virginia.edu ) is a leader in innovative customer service, the 
development of digital library initiatives and infrastructure, and is 
recognized for the strength and variety of its collections.  The Library system 
consists of twelve libraries, with independent libraries for health sciences, 
law, and business. The libraries support 13,000 undergraduates, 6,500 graduate 
students and 1,600 teaching faculty. The University and the Library have a 
strong commitment to achieving diversity among faculty and staff. The 
Neoclassical buildings of founder Thomas Jefferson's Academical Village still 
serves as the center of the University's Grounds 
(http://www.virginia.edu/uvatours/slideshow/ 
http://www.virginia.edu/uvatours/slideshow/ ) and as a unique backdrop for 
teaching, learning, and research.
Salary and Benefits:  Competitive depending on qualifications. This position 
has Administrative and Professional faculty status with excellent benefits, 
including 22 days of vacation and TIAA/CREF and other retirement plans. Review 
of applications will begin on November 2nd, 2009 and the position will be open 
until filled.  Applicants must apply through the University of Virginia online 
employment website at https://jobs.virginia.edu/ https://jobs.virginia.edu/ 
.   Search by position number FP677, complete application, and attach cover 
letter and resume, with contact information for three current, professional 
references.  For assistance with this process contact Library Human Resources 
at (434) 924-3081.
The University of Virginia is an Equal Opportunity/Affirmative Action employer 
strongly committed to achieving excellence through cultural diversity. The 
University actively encourages applications and nominations from members of 
underrepresented groups.


[CODE4LIB] lingua::stem::snowball

2009-10-12 Thread Eric Lease Morgan
Can someone help me use Lingua::Stem::Snowball more efficiently?

I want to count the total number of times a word stem appears in a  
hash. Here is a short example:


use strict;
use Lingua::Stem::Snowball;
my $idea  = 'books';
my %words = ( 'books'= 5,
  'library'   = 6,
  'librarianship' = 5,
  'librarians'= 3,
  'librarian' = 3,
  'book'  = 3,
  'museums'   = 2
);
my $stemmer   = Lingua::Stem::Snowball-new( lang = 'en' );
my $idea_stem = $stemmer-stem( $idea );
print $idea ($idea_stem)\n;
my $total = 0;
foreach my $word ( keys %words ) {
  my $word_stem = $stemmer-stem( $word );
  print \t$word ($word_stem)\n;
  if ( $idea_stem eq $word_stem ) { $total += $words{ $word } }
}
print $total\n;


In the end, the value of $total equals 8. That is, more or less, what  
I expect, but how can I make the foreach loop more efficient? In  
reality, my application fills %words up as many as 150,000 keys.  
Moreover, $idea is really just a single element in an array of about  
100 words. Doing the math, the if statement in my foreach loop will  
get executed as many as 1,500,000 times. To make matters even worse, I  
plan to run the whole program about 10,000 times. That is a whole lot  
of processing just to count words!

Is there someway I could short-circuit the foreach loop? I saw  
Lingua::Stem::Snowball's stem_in_place method, but to use it I must  
pass it an array disassociating my keys from their values.

Second, is there a way I can make the stemming more aggressive? For  
example, I was hoping the stem of library would equal the stems of  
library, librarianship, and librarian, but alas, they don't.

Any suggestions?

-- 
Eric Lease Morgan


Re: [CODE4LIB] lingua::stem::snowball

2009-10-12 Thread Benjamin Florin
It's been a while since I perled, so this might not be the most
idiomatic solution, but you could stem the entire words has list once
and create a hash of all the sums (%words_stems), then run the list of
idea words (@ideas), checking only the desired stems:

use strict;
use Lingua::Stem::Snowball;
my @ideas  = ('books', 'otters', 'library');
my %words = ( 'books'= 5,
 'library'   = 6,
 'librarianship' = 5,
 'librarians'= 3,
 'librarian' = 3,
 'book'  = 3,
 'museums'   = 2
   );
my %words_stems = {};
my $stemmer   = Lingua::Stem::Snowball-new( lang = 'en' );

foreach my $word (keys %words)
{
$words_stems{$stemmer-stem($word)} += $words{$word};
}

foreach my $idea (@ideas)
{
my $idea_stem = $stemmer-stem( $idea );
print $idea ($idea_stem)\n;
print $words_stems{$idea_stem}.\n;
}

The first foreach loop is executed once per word in %words, while the
second foreach loop gets run once per item in @ideas. So 150,000 words
with 1,000 ideas would only call the stem function (which is
presumably where all the cost is) only 150,000 times.

If you plan on doing something similar later, you could save that hash
to disk, btw.

Ben

-- 
Benjamin Florin
Technology Assistant for Blended Education
Simmons College GSLIS
617-521-2842
benjamin.flo...@simmons.edu


Re: [CODE4LIB] lingua::stem::snowball

2009-10-12 Thread Matt Jones
Presumably the call to stem() is the expensive part of your loop, so I'd
want to cut that out if that is true. It looks to me that you can pass in an
array reference to stem(), so there's no need for calling stem() in a loop
at all.   I'd think something like the code below should help reduce your
calls to stem() to one call for the the idea and one call for the list of
words. Note I used a sorted set of keys in order to assure that I keep the
counts and the words that are stemmed in the same order when adding up the
totals.  The sort could be expensive too, so this may not work out better
for you, depending on your input data and the performance of sort() and
stem(). You could also use stem_in_place() if you don't want to make a copy
of the array.  Changing to use an array of @ideas instead of the scalar
$idea would use an analogous technique.

Matt

use strict;
use Lingua::Stem::Snowball;
my $idea  = 'books';
my %words = ( 'books'= 5,
 'library'   = 6,
 'librarianship' = 5,
 'librarians'= 3,
 'librarian' = 3,
 'book'  = 3,
 'museums'   = 2
   );
my $stemmer   = Lingua::Stem::Snowball-new( lang = 'en' );
my $idea_stem = $stemmer-stem( $idea );
print $idea ($idea_stem)\n;
my @wordkeys = sort(keys(%words));
my @stemwords = $stemmer-stem( \...@wordkeys );
my $i = 0;
my $total = 0;
foreach my $word (@wordkeys) {
if ( $idea_stem eq $stemwords[$i] ) { $total += $words{ $word } }
$i++;
}
print $total\n;