[CODE4LIB] 2 Positions Available: Georgia Tech - GALILEO Knowledge Repository Manager and Systems Analyst II (DSpace Technical Lead)

2009-10-16 Thread Bill Anderson
Manager, GALILEO Knowledge Repository
http://www.library.gatech.edu/about/jobs.php#gkr
Position:

The Galileo Knowledge Repository (GKR) Manager is a repository professional 
responsible for the daily project management and overall development of the GKR 
as a service in Georgia. It is a three-year, IMLS-funded position residing at 
the Georgia Tech Library and Information Center, reporting to its Associate 
Director for Technology and Resource Services. The Manager will work closely 
with the grant project PIs (2), the liaisons at each partner institution (6), 
and the GKR committees. S/he will perform promotion and outreach activities as 
well as train and advise institutions with hosted IRs on metadata standards, 
submission to DSpace, copyright issues, and identifying and creating digital 
content. Partner training sessions will be conducted by the GKR Manager, with 
assistance by members of the GKR committees; they will be conducted virtually 
and onsite at the partner institutions. The GKR Manager also will provide 
guidance to the GKR Content Submission Service. These activit!
 ies will be targeted at groups of partner institutions’ faculty, librarians, 
and archivists at the seven GKR partner sites. The Manager will be a resource 
and lead the training program offered to others involved or interested in 
statewide and consortial IR initiatives. S/he will serve on the GKR committees 
– Steering, Technical, Content and Metadata, Outreach and Evaluation, and the 
Symposium and Workshop Committee. The GKR Manager will chair the Symposium and 
Workshop committee.
Qualifications:

Required - ALA-accredited MLS or equivalent degree. Knowledge of current and 
emerging technologies in Web-based digital library services, particularly 
institutional repositories. Experience with traditional and emerging metadata 
standards, i.e. Dublin Core, EAD, TEI, or others. Knowledge of current digital 
library technologies, standards, and best practices. Knowledge of Web page 
development and of Web-authoring tools and HTML. Ability to provide effective 
individual and group training. Demonstrated ability to plan, initiate, and 
implement effective programs, projects, and services.   Excellent 
organizational skills with aptitude for complex analytical and detailed work.   
Ability to work independently as well as collaboratively in a rapidly changing 
environment. Enthusiastic attitude toward consortial digital library work with 
multiple collaborators.

Preferred - Familiarity with script languages. Knowledge of developing 
interfaces for online resources like repositories. Demonstrated record of 
professional development.   Supervisory experience. Experience with DSpace 
repository software.
Salary:

Salary is competitive and based on qualifications and experience. The minimum 
salary for this position is $46,000. Visit the Georgia Tech Human Resources 
page to view the benefits package.
Environment:

As an ARL library supporting an institution with nearly 5,000 graduate students 
and over $300 million per year in research activity, Georgia Tech provides a 
platform that will challenge and engage creative leaders on the frontier of new 
ways to capture and manage intellectual content. Working with a team of 
librarians, the GT Library provides the operations needed to maintain the 
information resources required to support an exceptionally energetic academic 
enterprise.

The Library and Information Center, a member of the Association of Research 
Libraries, is central to the Institute's instructional and research programs. 
The Georgia Institute of Technology, with nearly 20,000 faculty, students, and 
staff is one of the nation's outstanding universities, with nationally 
recognized programs in science and engineering. The Library is a leader in 
library automation, participates in the statewide consortium, GALILEO, and 
provides access to an ever-increasing number of databases, electronic books and 
electronic journals.
Application Process:

Applications will be reviewed upon receipt and will be accepted until the 
position is filled.   Employment is contingent on proof of the legal right to 
work in the United States. Send letter of application, resume, and names, 
addresses, phone numbers and e-mail addresses of five references to:
Sharon Baines, SPHR
HR Officer
Library and Information Center
Georgia Institute of Technology
Atlanta, GA 30332-0900
sharon.bai...@library.gatech.edu

Systems Analyst II (DSpace Technical Lead)
http://www.library.gatech.edu/about/jobs.php#sa2

Position:

Responsible for the technical management and development of four hosted DSpace 
sites, assisting with three existing DSpace repositories, and a searchable 
metadata repository site, as part of the GALILEO Knowledge Repository (GKR) 
project, the statewide digital repository of Georgia.

This position reports directly to the GKR Manager with a reporting relationship 
to the GT Library's Associate Director for Technology and Resource 

[CODE4LIB] solr - search query count | highlighting

2009-10-16 Thread Eric James
For our finding aids, we are using fedoragenericsearch 2.2 with solr as index.  
Because the EADs can be huge, the EADs are indexed but not stored (with stored 
EADs, search time for ~500 objects = 20 min rather than  1 sec).

 

However, we would like to have number of search terms found within each hit.  
For example, CDL's collection:

http://www.oac.cdlib.org/search?query=Donner

 

Also we would like highlighting/snippets of the search term similar to CDL's.

 

Is it a lost cause to have this functionality without storing the EAD?  Is 
there a way to store the EAD and have a reasonable response time?

 

---

Eric James

Yale University Libraries

 

 
  

Re: [CODE4LIB] solr - search query count | highlighting

2009-10-16 Thread Ethan Gruber
Hi Eric,

You do not have to store the entire text content of the EAD guide in order
to enable facets.  Here's an example:
http://kittredgecollection.org/results?q=*:* .  There are about 15 facets
enabled on a collection of almost 1500 EAD documents (though quite small in
filesize compared to traditional EAD finding aids), and there's no slowdown
whatsoever.  I don't believe you need to store the guides to enable
highlighting either, though I have heard there is some dropoff in
performance with highlighting enabled.  I've never done benchmarking on
highlighting enabled versus disabled, so I can't tell you how much of a
dropoff there is.  In an index of only several hundred documents, I would
think that the dropoff with highlighting enabled would be fairly negligible.

Ethan

On Fri, Oct 16, 2009 at 3:12 PM, Eric James cirese...@hotmail.com wrote:

 For our finding aids, we are using fedoragenericsearch 2.2 with solr as
 index.  Because the EADs can be huge, the EADs are indexed but not stored
 (with stored EADs, search time for ~500 objects = 20 min rather than  1
 sec).



 However, we would like to have number of search terms found within each
 hit.  For example, CDL's collection:

 http://www.oac.cdlib.org/search?query=Donner



 Also we would like highlighting/snippets of the search term similar to
 CDL's.



 Is it a lost cause to have this functionality without storing the EAD?  Is
 there a way to store the EAD and have a reasonable response time?



 ---

 Eric James

 Yale University Libraries







Re: [CODE4LIB] solr - search query count | highlighting

2009-10-16 Thread Eric James
Thanks for your response.  But, yes I'm able to use facets in general, and yes 
I'm able to do highlighting on stored fields.

 

But finding how many times the query appears in the full text is my question. 
For example say you search on Heisenberg   We'd like to see:

 

Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid

Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid

Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid

etc

 

Could there be a solr parameter that calculates this? Otherwise a klugey, not 
very scalable method could be that once you retrieve a solr result xml, find 
the fedora pid, retrieve the EAD full text, run a standard function to count 
how many times the query appears in the text for each hit, and add parameters 
back into the xml with these counts. 

 

 
 Date: Fri, 16 Oct 2009 15:27:42 -0400
 From: ewg4x...@gmail.com
 Subject: Re: [CODE4LIB] solr - search query count | highlighting
 To: CODE4LIB@LISTSERV.ND.EDU
 
 Hi Eric,
 
 You do not have to store the entire text content of the EAD guide in order
 to enable facets. Here's an example:
 http://kittredgecollection.org/results?q=*:* . There are about 15 facets
 enabled on a collection of almost 1500 EAD documents (though quite small in
 filesize compared to traditional EAD finding aids), and there's no slowdown
 whatsoever. I don't believe you need to store the guides to enable
 highlighting either, though I have heard there is some dropoff in
 performance with highlighting enabled. I've never done benchmarking on
 highlighting enabled versus disabled, so I can't tell you how much of a
 dropoff there is. In an index of only several hundred documents, I would
 think that the dropoff with highlighting enabled would be fairly negligible.
 
 Ethan
 
 On Fri, Oct 16, 2009 at 3:12 PM, Eric James cirese...@hotmail.com wrote:
 
  For our finding aids, we are using fedoragenericsearch 2.2 with solr as
  index. Because the EADs can be huge, the EADs are indexed but not stored
  (with stored EADs, search time for ~500 objects = 20 min rather than  1
  sec).
 
 
 
  However, we would like to have number of search terms found within each
  hit. For example, CDL's collection:
 
  http://www.oac.cdlib.org/search?query=Donner
 
 
 
  Also we would like highlighting/snippets of the search term similar to
  CDL's.
 
 
 
  Is it a lost cause to have this functionality without storing the EAD? Is
  there a way to store the EAD and have a reasonable response time?
 
 
 
  ---
 
  Eric James
 
  Yale University Libraries
 
 
 
 
 
  

Re: [CODE4LIB] solr - search query count | highlighting

2009-10-16 Thread Rob Casson
i think some of the new TermVectorComponent stuff might be
applicable...i've not experimented with it yet tho, so YMMV.

 http://wiki.apache.org/solr/TermVectorComponent

it's only part of 1.4, which is due for a release any day now, once
they patch up a Lucene bug


On Fri, Oct 16, 2009 at 3:52 PM, Eric James cirese...@hotmail.com wrote:
 Thanks for your response.  But, yes I'm able to use facets in general, and 
 yes I'm able to do highlighting on stored fields.



 But finding how many times the query appears in the full text is my question. 
 For example say you search on Heisenberg   We'd like to see:



 Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid

 Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid

 Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid

 etc



 Could there be a solr parameter that calculates this? Otherwise a klugey, not 
 very scalable method could be that once you retrieve a solr result xml, find 
 the fedora pid, retrieve the EAD full text, run a standard function to count 
 how many times the query appears in the text for each hit, and add parameters 
 back into the xml with these counts.




 Date: Fri, 16 Oct 2009 15:27:42 -0400
 From: ewg4x...@gmail.com
 Subject: Re: [CODE4LIB] solr - search query count | highlighting
 To: CODE4LIB@LISTSERV.ND.EDU

 Hi Eric,

 You do not have to store the entire text content of the EAD guide in order
 to enable facets. Here's an example:
 http://kittredgecollection.org/results?q=*:* . There are about 15 facets
 enabled on a collection of almost 1500 EAD documents (though quite small in
 filesize compared to traditional EAD finding aids), and there's no slowdown
 whatsoever. I don't believe you need to store the guides to enable
 highlighting either, though I have heard there is some dropoff in
 performance with highlighting enabled. I've never done benchmarking on
 highlighting enabled versus disabled, so I can't tell you how much of a
 dropoff there is. In an index of only several hundred documents, I would
 think that the dropoff with highlighting enabled would be fairly negligible.

 Ethan

 On Fri, Oct 16, 2009 at 3:12 PM, Eric James cirese...@hotmail.com wrote:

  For our finding aids, we are using fedoragenericsearch 2.2 with solr as
  index. Because the EADs can be huge, the EADs are indexed but not stored
  (with stored EADs, search time for ~500 objects = 20 min rather than  1
  sec).
 
 
 
  However, we would like to have number of search terms found within each
  hit. For example, CDL's collection:
 
  http://www.oac.cdlib.org/search?query=Donner
 
 
 
  Also we would like highlighting/snippets of the search term similar to
  CDL's.
 
 
 
  Is it a lost cause to have this functionality without storing the EAD? Is
  there a way to store the EAD and have a reasonable response time?
 
 
 
  ---
 
  Eric James
 
  Yale University Libraries
 
 
 
 
 



Re: [CODE4LIB] solr - search query count | highlighting

2009-10-16 Thread Király Péter

Hi Eric,

If you use debugQuery=on parameter, you'll receive the explain structure, 
which tell

you about the score number calculation factors. An example:

str name=oai:URMST:Transformation_Service/1
1.5076942 = (MATCH) fieldWeight(text:chant in 0), product of:
 1.4142135 = tf(termFreq(text:chant)=2)
 6.8230457 = idf(docFreq=1, numDocs=676)
 0.15625 = fieldNorm(field=text, doc=0)
/str

Here tf(termFreq(text:chant)=2) tell you, that the queried term found two 
times
in the document. You should apply a regex to extract this info from the 
explain
string. Since this term is an analyzed term, it is possible that it not 
equals with the
user input, but debug's 'parsedquery' parameter tell you the terms Solr 
search

behind the scene.

In Lucene, if the field stores the termVector's positions, there are API 
calls, that
you can get the exact place of the term within the field (as character 
positions,
or as the n-th token), but I don't know how to extract this info through 
Solr.


Hope this helps.

Király Péter
eXtensible Catalog
http://xcproject.org

- Original Message - 
From: Eric James cirese...@hotmail.com

To: CODE4LIB@LISTSERV.ND.EDU
Sent: Friday, October 16, 2009 9:52 PM
Subject: Re: [CODE4LIB] solr - search query count | highlighting


Thanks for your response.  But, yes I'm able to use facets in general, and 
yes I'm able to do highlighting on stored fields.




But finding how many times the query appears in the full text is my 
question. For example say you search on Heisenberg   We'd like to see:




Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid

Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid

Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid

etc



Could there be a solr parameter that calculates this? Otherwise a klugey, 
not very scalable method could be that once you retrieve a solr result xml, 
find the fedora pid, retrieve the EAD full text, run a standard function to 
count how many times the query appears in the text for each hit, and add 
parameters back into the xml with these counts.






Date: Fri, 16 Oct 2009 15:27:42 -0400
From: ewg4x...@gmail.com
Subject: Re: [CODE4LIB] solr - search query count | highlighting
To: CODE4LIB@LISTSERV.ND.EDU

Hi Eric,

You do not have to store the entire text content of the EAD guide in order
to enable facets. Here's an example:
http://kittredgecollection.org/results?q=*:* . There are about 15 facets
enabled on a collection of almost 1500 EAD documents (though quite small 
in
filesize compared to traditional EAD finding aids), and there's no 
slowdown

whatsoever. I don't believe you need to store the guides to enable
highlighting either, though I have heard there is some dropoff in
performance with highlighting enabled. I've never done benchmarking on
highlighting enabled versus disabled, so I can't tell you how much of a
dropoff there is. In an index of only several hundred documents, I would
think that the dropoff with highlighting enabled would be fairly 
negligible.


Ethan

On Fri, Oct 16, 2009 at 3:12 PM, Eric James cirese...@hotmail.com wrote:

 For our finding aids, we are using fedoragenericsearch 2.2 with solr as
 index. Because the EADs can be huge, the EADs are indexed but not stored
 (with stored EADs, search time for ~500 objects = 20 min rather than  1
 sec).



 However, we would like to have number of search terms found within each
 hit. For example, CDL's collection:

 http://www.oac.cdlib.org/search?query=Donner



 Also we would like highlighting/snippets of the search term similar to
 CDL's.



 Is it a lost cause to have this functionality without storing the EAD? 
 Is

 there a way to store the EAD and have a reasonable response time?



 ---

 Eric James

 Yale University Libraries









Re: [CODE4LIB] solr - search query count | highlighting

2009-10-16 Thread Mark A. Matienzo
On Fri, Oct 16, 2009 at 3:12 PM, Eric James cirese...@hotmail.com wrote:
 For our finding aids, we are using fedoragenericsearch 2.2 with solr as 
 index.  Because the EADs can be huge, the EADs are indexed but not stored 
 (with stored EADs, search time for ~500 objects = 20 min rather than  1 sec).

Eric, what do your actual schema and Solr configuration look like?
One possibility would be to store and index the actual contents of the
EAD in a separate field and not return that field by default in query
responses. For what it's worth, this is what we're doing at NYPL for
our EAD files that are being indexed as part of the new Drupal-based
site we're building.


Mark A. Matienzo
Applications Developer, Digital Experience Group
The New York Public Library






 However, we would like to have number of search terms found within each hit.  
 For example, CDL's collection:

 http://www.oac.cdlib.org/search?query=Donner



 Also we would like highlighting/snippets of the search term similar to CDL's.



 Is it a lost cause to have this functionality without storing the EAD?  Is 
 there a way to store the EAD and have a reasonable response time?



 ---

 Eric James

 Yale University Libraries







Re: [CODE4LIB] solr - search query count | highlighting

2009-10-16 Thread Roy Tennant
Maybe you should look into using what CDL uses to get that functionality,
which is also based on Lucene:

http://www.cdlib.org/inside/projects/xtf/

Roy


On 10/16/09 10/16/09 € 12:12 PM, Eric James cirese...@hotmail.com wrote:

 For our finding aids, we are using fedoragenericsearch 2.2 with solr as index.
 Because the EADs can be huge, the EADs are indexed but not stored (with stored
 EADs, search time for ~500 objects = 20 min rather than  1 sec).
 
  
 
 However, we would like to have number of search terms found within each hit.
 For example, CDL's collection:
 
 http://www.oac.cdlib.org/search?query=Donner
 
  
 
 Also we would like highlighting/snippets of the search term similar to CDL's.
 
  
 
 Is it a lost cause to have this functionality without storing the EAD?  Is
 there a way to store the EAD and have a reasonable response time?
 
  
 
 ---
 
 Eric James
 
 Yale University Libraries