Re: [CODE4LIB] text mining software
Do any of these work in Hadoop using MapReduce as a programming model? It seems like Hadoop would be a natural use case for text mining and analysis. Alan On Aug 27, 2013, at 7:44 PM, Riley, Jenn jlri...@email.unc.edu wrote: This is still command-line, but Mallet is heavily used in the DH community: http://mallet.cs.umass.edu/. I think MONK (http://monkproject.org/) has a UI, but I'm not overly familiar with its features. Jenn Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill http://cdla.unc.edu/ http://www.lib.unc.edu/users/jlriley jennri...@unc.edu (919) 843-5910 On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote: What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SASĀ® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
Re: [CODE4LIB] term co-occurrence analysis?
Beth, MarkLogic has community and academic editions that do a really good job at co-occurence analysis. We use it for all our XML projects. http://www.marklogic.com/products-and-services/marklogic-6/ Alan Alan Darnell Scholars Portal On Oct 1, 2012, at 7:55 PM, Bess Sadler bess.sad...@gmail.commailto:bess.sad...@gmail.com wrote: For a full-text search system we're prototyping, we are being asked to provide term co-occurrence analysis. I'm not very familiar with this concept, so maybe someone on the list can describe it better, but I believe that what is wanted is to be able to query a text corpus for a given word, and to receive in return a list of words that co-occur with the search term, along with some indication of how often those words co-occur. Something like this IBM Many Eyes demo: http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/clint-eastwood-applause-lines-at-r (but we're not necessarily looking for a visualization, just a way to do the query). Some google searching gives me lots of scholarly articles from computational linguistics and humanities computing, but nothing like here's a recipe for how to do this in solr which is what I would really love. Has anyone done this? How did you approach it? Are there tools you can recommend? Articles or books I should read? Many thanks in advance, Bess
Re: [CODE4LIB] Serials Solutions Summon
It is possible for a consortium to build the same sort of service as Serials Solutions. Besides the OhioLink example, we've been doing that in Ontario for the last 7 years or so - aggregating ejournal content (15 million articles), abstract and index databases (over 100 now in partnership with Proquest), ebooks (about 50,000 commercial ebooks and 170,000 plus digitized ebooks from the Open Content Alliance). It is a significant effort to deal with all the data feeds but as publishers migrate their production processes to XML we're finding that it is getting a little easier each year. We aggregate everything in a single XML database from a company called MarkLogic. The biggest issues we struggle with are currency - it's never as fast as the publisher site though it isn't far behind when things are working well - and quality control - the publisher production processes are shifting to XML but the quality of the data varies. But hey, it's a library, and these are age-old issues present even in the print world. Alan On 4/21/09 2:13 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Peter Murray wrote: I don't think it is part of SerSol's business model to offer a feed of the full metadata it aggregates, but it does seem to be part of the business model to offer an API upon which you could put your own interface to the underlying aggregated data. Yep, it's not presently, but I'm hoping that in the future they expand to that business model as well. I think it's feasible. An API on which you can act on their index is great. But actually having the data to re-index yourself in exactly the way you wanted would give you even more power (if you wanted to do the more work to get it). And would still be worth paying SerSol for, for the work of aggregating and normalizing the data. Jonathan
Re: [CODE4LIB] Video encoding done - Mashup idea request
There is a software product called Wirecast that synchs the recordings and the slides automatically. http://www.varasoftware.com/products/wirecast/ We've used it for a number of web cast projects and it works well. Alan On 16-Mar-07, at 4:38 PM, Smith,Devon wrote: 2. Two camera's so the person at the podium can be composed inside their slides / demos. If I understand this, you're thinking that one camera would capture the speaker, and the other would capture the slide presentation. If it's possible, wouldn't it be better to put the slides directly into the video, rather than showing video of the projected slides? Screen capture could give the same option for demos, right? Or is that really tough? What I don't know about video capture/edit could just about fill the grand canyon. Given that, thanks to all the people who've pulled the video together. Community Effort++ /dev -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Noel Peden Sent: Friday, March 16, 2007 12:02 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Video encoding done - Mashup idea request You are all certainly welcome. I'm glad to be able to do the video. Thanks are due to Dan Scott and Karen Schneider for bringing their cameras and providing tapes. Karen shot the first 1.5 days of video too. Ryan Eby has done pretty much all the work getting the files on Google and posting / embedding on code4lib.org. I expect to be there next year, and perhaps we can raise the bar a bit more (but not too much.) Here are some ideas (suggestions are welcome): 1. Wired microphone for consistent sound 2. Two camera's so the person at the podium can be composed inside their slides / demos. We'll wait for the year after to try out 'bullet time' Matrix style shots. :) Any other suggestions? Regards, Noel