Re: [CODE4LIB] indexing word documents using solr [diacritics]

2015-02-12 Thread Karl Holten
Ah, the wonderful world of character encoding...

To quote the Solr wiki:
There are no known bugs with Solr's character handling, but there have been 
some reported issues with the way different application servers (and different 
versions of the same application server) treat incoming and outgoing multibyte 
characters. In particular, people have reported better success with Tomcat than 
with Jetty... 
(https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F )

I'd probably start by enabling UTF-8 in Tomcat/Jetty and see if that resolves 
the issue. 

If not, I'd check the original files to see what its character encoding is, and 
then check each application that handles the documents to make sure it's using 
that encoding. It might be that the original isn't in UTF-8, or if it is, that 
somewhere along the way the parser, the perl interface, or some other unknown 
culprit is attempting to change it.

Regards,
Karl Holten
Systems Integration Specialist
SWITCH Inc
414-382-6711

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric 
Lease Morgan
Sent: Thursday, February 12, 2015 2:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] indexing word documents using solr [diacritics]

How do I retain diacritics in a Solr index, and how to I search for words 
containing them?

I have extracted the plain text out of set of Word documents. I have then used 
a Perl interface (WebService::Solr) to add the plain text to a Solr index using 
a field type called text_general:

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

It seems as if I am unable to search for words like ejecución because the 
diacritic gets in the way. What am I doing wrong?

— 
Eric


[CODE4LIB] Survey on Programming Languages, Frameworks, and Web Content Management Systems used in Libraries

2015-02-12 Thread Lauren Magnuson
This is a gentle reminder that you are invited to participate in a research
study about the use of programming languages, frameworks, and web content
management systems used in libraries. Thank you to all who have already
participated in the survey!

You must be 18 years or older and employed (either full or part-time) in a
library or archive organization.  We strongly encourage those who have
knowledge of and experience with programming languages, application
development, or scripting and are employed in libraries to respond to the
survey.  To participate, please click the survey link below:

https://www.surveymonkey.com/s/D7L68NQ

If you decide to participate in this study, you will be asked to respond to
approximately 22 questions in an online survey.  The survey will take
approximately 25 minutes of your time.

Your responses are anonymous and any potentially identifying information
will be removed from the response data during analysis.

Research findings from this study will be disseminated widely through an
open-access publication and via the ACRL TechConnect blog.

Remember, this is completely voluntary. You can choose to be in the study
or not. If you'd like to participate or have any questions about the study,
please email or contact the study’s principal investigator at
lauren.magnu...@csun.edu.

The survey will be open until March 15th.

Thank you,

Lauren Magnuson, CSU Northridge

Bohyun Kim, University of Maryland, Baltimore

Eric Phetteplace, California College of the Arts

Margaret Heller, Loyola University, Chicago


Re: [CODE4LIB] Info request - Library Hackathon for students

2015-02-12 Thread danielle plumer
Some DPLA Community Reps put together a hackathon planning guide last fall (
http://dp.la/info/2014/10/07/dpla-community-reps-produce-hackathon-planning-guide-now-available/).
It was based in part on some notes I made after planning a hackathon for
the Texas Digital Library las spring, which was however directed mostly at
librarians wanting to dip their toes into tech.

Speaking of DPLA, the applications for the third round of community reps
close tomorrow Feb. 13. It's a great way to learn more about DPLA and to
share that knowledge with your community!

http://dp.la/info/2015/01/15/apply-to-dpla-reps-third-class/

Danielle

-- 

Danielle Cunniff Plumer
dcplumer associates
512-508-3099
danie...@dcplumer.com



On Thu, Feb 12, 2015 at 9:51 AM, Heather Claxton claxt...@gmail.com wrote:

 My husband's company uses student hack-a-thons as recruitment tools.  It
 gives them a chance to see what the students can do, talk to them in a
 casual manner, offer mentoring ect.  Generally, they sponsor a prize as a
 thank you for letting them observe the hack-a-thon.   On the flipside, it's
 a great marketing ploy on the organizers end, since a lot of senior
 students are starting to look for potential job opportunities, and will
 participate purely for that reason.  You could probably contact your
 university career center to help you find an interested/local sponsor.

 Good luck!  I hope it turns out well.

 On Wed, Feb 11, 2015 at 9:37 AM, Craig Boman craig.bo...@gmail.com
 wrote:

  Dear Code4Lib,
 
  Has your library ever hosted a hackathon for university students? If so,
  would you do it again? Anything you wish you had known before hosting the
  hackathon?
 
  From the list archives, it looks like most of the hackathons at libraries
  have been for librarians, rather than university students. Please feel
 free
  to share any ideas.
 
  Thanks,
 
  Craig Boman
  Applications Support Specialist
  University of Dayton Libraries
  300 College Park
  Dayton, OH, 4569