Re: [CODE4LIB] text mining software
Alan, if you are looking for data mining software that runs well in Hadoop, I would definitely recommend looking into Apache Mahout [1]. This software is specifically focused on categorization and clustering, and these algorithms tend to work well in the distributed architecture of a Hadoop-based system. If you are looking for parsers, taggers, tokenizers, then a different system (Gate / OpenNLP / UIMA) would be more appropriate. -Aaron [1] http://mahout.apache.org On Aug 27, 2013, at 7:47 PM, Alan Darnell alan.darn...@utoronto.ca wrote: Do any of these work in Hadoop using MapReduce as a programming model? It seems like Hadoop would be a natural use case for text mining and analysis. Alan On Aug 27, 2013, at 7:44 PM, Riley, Jenn jlri...@email.unc.edu wrote: This is still command-line, but Mallet is heavily used in the DH community: http://mallet.cs.umass.edu/. I think MONK (http://monkproject.org/) has a UI, but I'm not overly familiar with its features. Jenn Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill http://cdla.unc.edu/ http://www.lib.unc.edu/users/jlriley jennri...@unc.edu (919) 843-5910 On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote: What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
[CODE4LIB] text mining software
What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
Re: [CODE4LIB] text mining software
Hi, Eric, I don't have any experience in this field, but I went looking a while ago when the topic came up, and these two links are in my notes for further exploration, if the topic ever comes around again: http://wordseer.berkeley.edu/ http://mininghumanities.com/ May they serve you well. -- HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ A child who does not play is not a child, but the man who doesn't play has lost forever the child who lived in him and who he will miss terribly. --Pablo Neruda On 8/27/13 10:24 AM, Eric Lease Morgan emor...@nd.edu wrote: What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
Re: [CODE4LIB] text mining software
More often seen as a tool for the social sciences, NVivo from QSRIhttp://www.qsrinternational.com/products_nvivo.aspx has some respectable text manipulation capabilities (stemming, counting, proximity, clouds, etc.), and since it is an established tool in certain disciplines, it's either cheap or free on lots of campuses, via institutional licensing. And they have free trials as well. --DBL -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Pottinger, Hardy J. Sent: Tuesday, August 27, 2013 11:51 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] text mining software Hi, Eric, I don't have any experience in this field, but I went looking a while ago when the topic came up, and these two links are in my notes for further exploration, if the topic ever comes around again: http://wordseer.berkeley.edu/ http://mininghumanities.com/ May they serve you well. -- HARDY POTTINGER pottinge...@umsystem.edumailto:pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ A child who does not play is not a child, but the man who doesn't play has lost forever the child who lived in him and who he will miss terribly. --Pablo Neruda On 8/27/13 10:24 AM, Eric Lease Morgan emor...@nd.edumailto:emor...@nd.edu wrote: What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SAS(r) Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
Re: [CODE4LIB] text mining software
NVivo is officially the only text mining tool that we support here, too. (Unofficially, bring something cool to my attention and you probably won't have to try very hard to convince me to help you set it up.) It doesn't just stem, it also handles synonyms and related terms very nicely. Official NVivo video demoing how to do text analysis (what they call text mining) in NVivo: http://www.youtube.com/watch?v=ypo6lrpwDZ8 Julia * Julia Bauder Social Studies and Data Services Librarian Grinnell College Libraries Sixth Ave. Grinnell, IA 50112 641-269-4431 On Tue, Aug 27, 2013 at 11:07 AM, David Lowe david.l...@lib.uconn.eduwrote: More often seen as a tool for the social sciences, NVivo from QSRI http://www.qsrinternational.com/products_nvivo.aspx has some respectable text manipulation capabilities (stemming, counting, proximity, clouds, etc.), and since it is an established tool in certain disciplines, it's either cheap or free on lots of campuses, via institutional licensing. And they have free trials as well. --DBL -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Pottinger, Hardy J. Sent: Tuesday, August 27, 2013 11:51 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] text mining software Hi, Eric, I don't have any experience in this field, but I went looking a while ago when the topic came up, and these two links are in my notes for further exploration, if the topic ever comes around again: http://wordseer.berkeley.edu/ http://mininghumanities.com/ May they serve you well. -- HARDY POTTINGER pottinge...@umsystem.edumailto:pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ A child who does not play is not a child, but the man who doesn't play has lost forever the child who lived in him and who he will miss terribly. --Pablo Neruda On 8/27/13 10:24 AM, Eric Lease Morgan emor...@nd.edumailto: emor...@nd.edu wrote: What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SAS(r) Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
Re: [CODE4LIB] text mining software
This is still command-line, but Mallet is heavily used in the DH community: http://mallet.cs.umass.edu/. I think MONK (http://monkproject.org/) has a UI, but I'm not overly familiar with its features. Jenn Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill http://cdla.unc.edu/ http://www.lib.unc.edu/users/jlriley jennri...@unc.edu (919) 843-5910 On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote: What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
Re: [CODE4LIB] text mining software
Do any of these work in Hadoop using MapReduce as a programming model? It seems like Hadoop would be a natural use case for text mining and analysis. Alan On Aug 27, 2013, at 7:44 PM, Riley, Jenn jlri...@email.unc.edu wrote: This is still command-line, but Mallet is heavily used in the DH community: http://mallet.cs.umass.edu/. I think MONK (http://monkproject.org/) has a UI, but I'm not overly familiar with its features. Jenn Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill http://cdla.unc.edu/ http://www.lib.unc.edu/users/jlriley jennri...@unc.edu (919) 843-5910 On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote: What sorts of text mining software do y'all support / use in your libraries? We here in the Hesburgh Libraries at the University of Notre Dame have all but opened a place called the Center For Digital Scholarship. We are / will be providing a number of different services to a number of different audiences. These services include but are not necessarily limited exactly to: * data management consultation * data analysis and visualization * geographic information systems support * text mining investigations * referrals to other centers across campus I am expected to support the text mining investigations. I have traditionally used open source tools do to my work. Many of these tools require some sort of programming in order to exploit. To some degree I am expected mount text mining software on our local Windows and Macintosh computers here in our Center. I am familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good too, but a bit long in the tooth. [2] Do you know of other sets of tools to choose from? Are you familiar with SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics - http://www.sas.com/text-analytics/ [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ [5] RapidMiner - http://rapid-i.com/content/view/181/190/ -- Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries University of Notre Dame 574/631-8604
Re: [CODE4LIB] text mining software
There have been some great software recommendations in this thread, that I really don't want to quibble with. What I'd like to quibble with is the software-first approach. We've all tried the software-first approach, how many of us were happy with it? There is a standard in this area and that standard appears to have at least two non-trivial implementations, including from one software distributor whose name we all recognise. SPEC: http://docs.oasis-open.org/uima/v1.0/uima-v1.0.html APACHE UIMA: http://uima.apache.org/ GATE: http://gate.ac.uk/ Anyone have experience using the standard or these two implementations? cheers stuart -- Stuart Yeates Library Technology Services http://www.victoria.ac.nz/library/
Re: [CODE4LIB] text mining software
I worked a lot with GATE in a previous position (not in a library, but in a research position at the Univ. of Texas at Austin). It's handy in that there is both a UI version (GATE Developer) and a set of APIs (GATE Embedded), which were the only versions I worked with. Also nice is the fact that there is reasonably good documentation from the Univ. of Sheffield (http://gate.ac.uk/), including some basic video tutorials and slides from recent training courses that you can step through ( http://gate.ac.uk/wiki/TrainingCourseJune2013/). Pretty much all the standard text-mining tools can be accessed through GATE, by creating a pipeline that incorporates the tools you need. There are also some default machine learning options if you don't want to roll your own. There's even a UIMA plug-in if you'd like to use it inside a GATE pipeline. Danielle -- Danielle Cunniff Plumer dcplumer associates www.dcplumer.com dcplu...@gmail.com On Tue, Aug 27, 2013 at 5:15 PM, stuart yeates stuart.yea...@vuw.ac.nzwrote: There have been some great software recommendations in this thread, that I really don't want to quibble with. What I'd like to quibble with is the software-first approach. We've all tried the software-first approach, how many of us were happy with it? There is a standard in this area and that standard appears to have at least two non-trivial implementations, including from one software distributor whose name we all recognise. SPEC: http://docs.oasis-open.org/**uima/v1.0/uima-v1.0.htmlhttp://docs.oasis-open.org/uima/v1.0/uima-v1.0.html APACHE UIMA: http://uima.apache.org/ GATE: http://gate.ac.uk/ Anyone have experience using the standard or these two implementations? cheers stuart -- Stuart Yeates Library Technology Services http://www.victoria.ac.nz/**library/http://www.victoria.ac.nz/library/