Re: [CODE4LIB] indexing word documents using solr [diacritics]
Ah, the wonderful world of character encoding... To quote the Solr wiki: There are no known bugs with Solr's character handling, but there have been some reported issues with the way different application servers (and different versions of the same application server) treat incoming and outgoing multibyte characters. In particular, people have reported better success with Tomcat than with Jetty... (https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F ) I'd probably start by enabling UTF-8 in Tomcat/Jetty and see if that resolves the issue. If not, I'd check the original files to see what its character encoding is, and then check each application that handles the documents to make sure it's using that encoding. It might be that the original isn't in UTF-8, or if it is, that somewhere along the way the parser, the perl interface, or some other unknown culprit is attempting to change it. Regards, Karl Holten Systems Integration Specialist SWITCH Inc 414-382-6711 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Thursday, February 12, 2015 2:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] indexing word documents using solr [diacritics] How do I retain diacritics in a Solr index, and how to I search for words containing them? I have extracted the plain text out of set of Word documents. I have then used a Perl interface (WebService::Solr) to add the plain text to a Solr index using a field type called text_general: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType It seems as if I am unable to search for words like ejecución because the diacritic gets in the way. What am I doing wrong? — Eric
[CODE4LIB] Survey on Programming Languages, Frameworks, and Web Content Management Systems used in Libraries
This is a gentle reminder that you are invited to participate in a research study about the use of programming languages, frameworks, and web content management systems used in libraries. Thank you to all who have already participated in the survey! You must be 18 years or older and employed (either full or part-time) in a library or archive organization. We strongly encourage those who have knowledge of and experience with programming languages, application development, or scripting and are employed in libraries to respond to the survey. To participate, please click the survey link below: https://www.surveymonkey.com/s/D7L68NQ If you decide to participate in this study, you will be asked to respond to approximately 22 questions in an online survey. The survey will take approximately 25 minutes of your time. Your responses are anonymous and any potentially identifying information will be removed from the response data during analysis. Research findings from this study will be disseminated widely through an open-access publication and via the ACRL TechConnect blog. Remember, this is completely voluntary. You can choose to be in the study or not. If you'd like to participate or have any questions about the study, please email or contact the study’s principal investigator at lauren.magnu...@csun.edu. The survey will be open until March 15th. Thank you, Lauren Magnuson, CSU Northridge Bohyun Kim, University of Maryland, Baltimore Eric Phetteplace, California College of the Arts Margaret Heller, Loyola University, Chicago
Re: [CODE4LIB] Info request - Library Hackathon for students
Some DPLA Community Reps put together a hackathon planning guide last fall ( http://dp.la/info/2014/10/07/dpla-community-reps-produce-hackathon-planning-guide-now-available/). It was based in part on some notes I made after planning a hackathon for the Texas Digital Library las spring, which was however directed mostly at librarians wanting to dip their toes into tech. Speaking of DPLA, the applications for the third round of community reps close tomorrow Feb. 13. It's a great way to learn more about DPLA and to share that knowledge with your community! http://dp.la/info/2015/01/15/apply-to-dpla-reps-third-class/ Danielle -- Danielle Cunniff Plumer dcplumer associates 512-508-3099 danie...@dcplumer.com On Thu, Feb 12, 2015 at 9:51 AM, Heather Claxton claxt...@gmail.com wrote: My husband's company uses student hack-a-thons as recruitment tools. It gives them a chance to see what the students can do, talk to them in a casual manner, offer mentoring ect. Generally, they sponsor a prize as a thank you for letting them observe the hack-a-thon. On the flipside, it's a great marketing ploy on the organizers end, since a lot of senior students are starting to look for potential job opportunities, and will participate purely for that reason. You could probably contact your university career center to help you find an interested/local sponsor. Good luck! I hope it turns out well. On Wed, Feb 11, 2015 at 9:37 AM, Craig Boman craig.bo...@gmail.com wrote: Dear Code4Lib, Has your library ever hosted a hackathon for university students? If so, would you do it again? Anything you wish you had known before hosting the hackathon? From the list archives, it looks like most of the hackathons at libraries have been for librarians, rather than university students. Please feel free to share any ideas. Thanks, Craig Boman Applications Support Specialist University of Dayton Libraries 300 College Park Dayton, OH, 4569