Re: [CODE4LIB] text mining software

2013-09-02 Thread Aaron Coburn
Alan,
if you are looking for data mining software that runs well in Hadoop, I would 
definitely recommend looking into Apache Mahout [1]. This software is 
specifically focused on categorization and clustering, and these algorithms 
tend to work well in the distributed architecture of a Hadoop-based system. If 
you are looking for parsers, taggers, tokenizers, then a different system (Gate 
/ OpenNLP / UIMA) would be more appropriate.

-Aaron

[1] http://mahout.apache.org


On Aug 27, 2013, at 7:47 PM, Alan Darnell alan.darn...@utoronto.ca wrote:

 Do any of these work in Hadoop using MapReduce as a programming model? It 
 seems like Hadoop would be a natural use case for text mining and analysis.  
 
 Alan
 
 On Aug 27, 2013, at 7:44 PM, Riley, Jenn jlri...@email.unc.edu wrote:
 
 This is still command-line, but Mallet is heavily used in the DH
 community: http://mallet.cs.umass.edu/. I think MONK
 (http://monkproject.org/) has a UI, but I'm not overly familiar with its
 features.
 
 Jenn
 
 
 Jenn Riley
 Head, Carolina Digital Library and Archives
 The University of North Carolina at Chapel Hill
 http://cdla.unc.edu/
 http://www.lib.unc.edu/users/jlriley
 
 jennri...@unc.edu
 (919) 843-5910
 
 
 
 
 
 On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote:
 
 What sorts of text mining software do y'all support / use in your
 libraries?
 
 We here in the Hesburgh Libraries at the University of Notre Dame have
 all but opened a place called the Center For Digital Scholarship. We are
 / will be providing a number of different services to a number of
 different audiences. These services include but are not necessarily
 limited exactly to:
 
 * data management consultation
 * data analysis and visualization
 * geographic information systems support
 * text mining investigations
 * referrals to other centers across campus
 
 I am expected to support the text mining investigations. I have
 traditionally used open source tools do to my work. Many of these tools
 require some sort of programming in order to exploit. To some degree I am
 expected mount text mining software on our local Windows and Macintosh
 computers here in our Center. I am familiar with the lists of tools
 available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good
 too, but a bit long in the tooth. [2]
 
 Do you know of other sets of tools to choose from? Are you familiar with
 SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5]
 
 [0] Bamboo Dirt - http://dirt.projectbamboo.org
 [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools
 [2] TAPoRware - http://taporware.ualberta.ca
 [3] Text Analytics - http://www.sas.com/text-analytics/
 [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/
 [5] RapidMiner - http://rapid-i.com/content/view/181/190/
 
 --
 Eric Lease Morgan, Digital Initiatives Librarian
 Hesburgh Libraries
 University of Notre Dame
 
 574/631-8604


[CODE4LIB] text mining software

2013-08-27 Thread Eric Lease Morgan
What sorts of text mining software do y'all support / use in your libraries?

We here in the Hesburgh Libraries at the University of Notre Dame have all but 
opened a place called the Center For Digital Scholarship. We are / will be 
providing a number of different services to a number of different audiences. 
These services include but are not necessarily limited exactly to:

 * data management consultation
 * data analysis and visualization
 * geographic information systems support
 * text mining investigations
 * referrals to other centers across campus

I am expected to support the text mining investigations. I have traditionally 
used open source tools do to my work. Many of these tools require some sort of 
programming in order to exploit. To some degree I am expected mount text mining 
software on our local Windows and Macintosh computers here in our Center. I am 
familiar with the lists of tools available at Bamboo as well as Hermeneuti.ca. 
[0, 1] TAPoRware is good too, but a bit long in the tooth. [2]

Do you know of other sets of tools to choose from? Are you familiar with SAS® 
Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5]

[0] Bamboo Dirt - http://dirt.projectbamboo.org
[1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools
[2] TAPoRware - http://taporware.ualberta.ca
[3] Text Analytics - http://www.sas.com/text-analytics/
[4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/
[5] RapidMiner - http://rapid-i.com/content/view/181/190/

--
Eric Lease Morgan, Digital Initiatives Librarian
Hesburgh Libraries
University of Notre Dame

574/631-8604


Re: [CODE4LIB] text mining software

2013-08-27 Thread Pottinger, Hardy J.
Hi, Eric, I don't have any experience in this field, but I went looking a
while ago when the topic came up, and these two links are in my notes for
further exploration, if the topic ever comes around again:

http://wordseer.berkeley.edu/

http://mininghumanities.com/


May they serve you well.

--
HARDY POTTINGER pottinge...@umsystem.edu
University of Missouri Library Systems
http://lso.umsystem.edu/~pottingerhj/
https://MOspace.umsystem.edu/
A child who does not play is not a child,
but the man who doesn't play has lost forever
the child who lived in him and who he will
miss terribly. 
--Pablo Neruda





On 8/27/13 10:24 AM, Eric Lease Morgan emor...@nd.edu wrote:

What sorts of text mining software do y'all support / use in your
libraries?

We here in the Hesburgh Libraries at the University of Notre Dame have
all but opened a place called the Center For Digital Scholarship. We are
/ will be providing a number of different services to a number of
different audiences. These services include but are not necessarily
limited exactly to:

 * data management consultation
 * data analysis and visualization
 * geographic information systems support
 * text mining investigations
 * referrals to other centers across campus

I am expected to support the text mining investigations. I have
traditionally used open source tools do to my work. Many of these tools
require some sort of programming in order to exploit. To some degree I am
expected mount text mining software on our local Windows and Macintosh
computers here in our Center. I am familiar with the lists of tools
available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good
too, but a bit long in the tooth. [2]

Do you know of other sets of tools to choose from? Are you familiar with
SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5]

[0] Bamboo Dirt - http://dirt.projectbamboo.org
[1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools
[2] TAPoRware - http://taporware.ualberta.ca
[3] Text Analytics - http://www.sas.com/text-analytics/
[4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/
[5] RapidMiner - http://rapid-i.com/content/view/181/190/

--
Eric Lease Morgan, Digital Initiatives Librarian
Hesburgh Libraries
University of Notre Dame

574/631-8604


Re: [CODE4LIB] text mining software

2013-08-27 Thread David Lowe
More often seen as a tool for the social sciences, NVivo from 
QSRIhttp://www.qsrinternational.com/products_nvivo.aspx has some respectable 
text manipulation capabilities (stemming, counting, proximity, clouds, etc.), 
and since it is an established tool in certain disciplines, it's either cheap 
or free on lots of campuses, via institutional licensing.  And they have free 
trials as well.

--DBL



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Pottinger, Hardy J.
Sent: Tuesday, August 27, 2013 11:51 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] text mining software



Hi, Eric, I don't have any experience in this field, but I went looking a while 
ago when the topic came up, and these two links are in my notes for further 
exploration, if the topic ever comes around again:



http://wordseer.berkeley.edu/



http://mininghumanities.com/





May they serve you well.



--

HARDY POTTINGER pottinge...@umsystem.edumailto:pottinge...@umsystem.edu 
University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/

https://MOspace.umsystem.edu/

A child who does not play is not a child, but the man who doesn't play has 
lost forever the child who lived in him and who he will miss terribly.

--Pablo Neruda











On 8/27/13 10:24 AM, Eric Lease Morgan 
emor...@nd.edumailto:emor...@nd.edu wrote:



What sorts of text mining software do y'all support / use in your

libraries?



We here in the Hesburgh Libraries at the University of Notre Dame have

all but opened a place called the Center For Digital Scholarship. We

are / will be providing a number of different services to a number of

different audiences. These services include but are not necessarily

limited exactly to:



 * data management consultation

 * data analysis and visualization

 * geographic information systems support

 * text mining investigations

 * referrals to other centers across campus



I am expected to support the text mining investigations. I have

traditionally used open source tools do to my work. Many of these tools

require some sort of programming in order to exploit. To some degree I

am expected mount text mining software on our local Windows and

Macintosh computers here in our Center. I am familiar with the lists of

tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is

good too, but a bit long in the tooth. [2]



Do you know of other sets of tools to choose from? Are you familiar

with SAS(r) Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4,

5]



[0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca -

http://hermeneuti.ca/voyeur/tools

[2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics -

http://www.sas.com/text-analytics/

[4] Data Miner -

http://www.statsoft.com/Products/STATISTICA/Data-Miner/

[5] RapidMiner - http://rapid-i.com/content/view/181/190/



--

Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries

University of Notre Dame



574/631-8604


Re: [CODE4LIB] text mining software

2013-08-27 Thread Julia Bauder
NVivo is officially the only text mining tool that we support here, too.
(Unofficially, bring something cool to my attention and you probably won't
have to try very hard to convince me to help you set it up.) It doesn't
just stem, it also handles synonyms and related terms very nicely.

Official NVivo video demoing how to do text analysis (what they call text
mining) in NVivo:
http://www.youtube.com/watch?v=ypo6lrpwDZ8

Julia


*

Julia Bauder

Social Studies and Data Services Librarian

Grinnell College Libraries

 Sixth Ave.

Grinnell, IA 50112



641-269-4431



On Tue, Aug 27, 2013 at 11:07 AM, David Lowe david.l...@lib.uconn.eduwrote:

 More often seen as a tool for the social sciences, NVivo from QSRI
 http://www.qsrinternational.com/products_nvivo.aspx has some respectable
 text manipulation capabilities (stemming, counting, proximity, clouds,
 etc.), and since it is an established tool in certain disciplines, it's
 either cheap or free on lots of campuses, via institutional licensing.  And
 they have free trials as well.

 --DBL



 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Pottinger, Hardy J.
 Sent: Tuesday, August 27, 2013 11:51 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] text mining software



 Hi, Eric, I don't have any experience in this field, but I went looking a
 while ago when the topic came up, and these two links are in my notes for
 further exploration, if the topic ever comes around again:



 http://wordseer.berkeley.edu/



 http://mininghumanities.com/





 May they serve you well.



 --

 HARDY POTTINGER pottinge...@umsystem.edumailto:pottinge...@umsystem.edu
 University of Missouri Library Systems
 http://lso.umsystem.edu/~pottingerhj/

 https://MOspace.umsystem.edu/

 A child who does not play is not a child, but the man who doesn't play
 has lost forever the child who lived in him and who he will miss terribly.

 --Pablo Neruda











 On 8/27/13 10:24 AM, Eric Lease Morgan emor...@nd.edumailto:
 emor...@nd.edu wrote:



 What sorts of text mining software do y'all support / use in your

 libraries?

 

 We here in the Hesburgh Libraries at the University of Notre Dame have

 all but opened a place called the Center For Digital Scholarship. We

 are / will be providing a number of different services to a number of

 different audiences. These services include but are not necessarily

 limited exactly to:

 

  * data management consultation

  * data analysis and visualization

  * geographic information systems support

  * text mining investigations

  * referrals to other centers across campus

 

 I am expected to support the text mining investigations. I have

 traditionally used open source tools do to my work. Many of these tools

 require some sort of programming in order to exploit. To some degree I

 am expected mount text mining software on our local Windows and

 Macintosh computers here in our Center. I am familiar with the lists of

 tools available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is

 good too, but a bit long in the tooth. [2]

 

 Do you know of other sets of tools to choose from? Are you familiar

 with SAS(r) Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4,

 5]

 

 [0] Bamboo Dirt - http://dirt.projectbamboo.org [1] Hermeneuti.ca -

 http://hermeneuti.ca/voyeur/tools

 [2] TAPoRware - http://taporware.ualberta.ca [3] Text Analytics -

 http://www.sas.com/text-analytics/

 [4] Data Miner -

 http://www.statsoft.com/Products/STATISTICA/Data-Miner/

 [5] RapidMiner - http://rapid-i.com/content/view/181/190/

 

 --

 Eric Lease Morgan, Digital Initiatives Librarian Hesburgh Libraries

 University of Notre Dame

 

 574/631-8604



Re: [CODE4LIB] text mining software

2013-08-27 Thread Riley, Jenn
This is still command-line, but Mallet is heavily used in the DH
community: http://mallet.cs.umass.edu/. I think MONK
(http://monkproject.org/) has a UI, but I'm not overly familiar with its
features.

Jenn


Jenn Riley
Head, Carolina Digital Library and Archives
The University of North Carolina at Chapel Hill
http://cdla.unc.edu/
http://www.lib.unc.edu/users/jlriley

jennri...@unc.edu
(919) 843-5910





On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote:

What sorts of text mining software do y'all support / use in your
libraries?

We here in the Hesburgh Libraries at the University of Notre Dame have
all but opened a place called the Center For Digital Scholarship. We are
/ will be providing a number of different services to a number of
different audiences. These services include but are not necessarily
limited exactly to:

 * data management consultation
 * data analysis and visualization
 * geographic information systems support
 * text mining investigations
 * referrals to other centers across campus

I am expected to support the text mining investigations. I have
traditionally used open source tools do to my work. Many of these tools
require some sort of programming in order to exploit. To some degree I am
expected mount text mining software on our local Windows and Macintosh
computers here in our Center. I am familiar with the lists of tools
available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good
too, but a bit long in the tooth. [2]

Do you know of other sets of tools to choose from? Are you familiar with
SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5]

[0] Bamboo Dirt - http://dirt.projectbamboo.org
[1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools
[2] TAPoRware - http://taporware.ualberta.ca
[3] Text Analytics - http://www.sas.com/text-analytics/
[4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/
[5] RapidMiner - http://rapid-i.com/content/view/181/190/

--
Eric Lease Morgan, Digital Initiatives Librarian
Hesburgh Libraries
University of Notre Dame

574/631-8604


Re: [CODE4LIB] text mining software

2013-08-27 Thread Alan Darnell
Do any of these work in Hadoop using MapReduce as a programming model? It seems 
like Hadoop would be a natural use case for text mining and analysis.  

Alan

On Aug 27, 2013, at 7:44 PM, Riley, Jenn jlri...@email.unc.edu wrote:

 This is still command-line, but Mallet is heavily used in the DH
 community: http://mallet.cs.umass.edu/. I think MONK
 (http://monkproject.org/) has a UI, but I'm not overly familiar with its
 features.
 
 Jenn
 
 
 Jenn Riley
 Head, Carolina Digital Library and Archives
 The University of North Carolina at Chapel Hill
 http://cdla.unc.edu/
 http://www.lib.unc.edu/users/jlriley
 
 jennri...@unc.edu
 (919) 843-5910
 
 
 
 
 
 On 8/27/13 11:24 AM, Eric Lease Morgan emor...@nd.edu wrote:
 
 What sorts of text mining software do y'all support / use in your
 libraries?
 
 We here in the Hesburgh Libraries at the University of Notre Dame have
 all but opened a place called the Center For Digital Scholarship. We are
 / will be providing a number of different services to a number of
 different audiences. These services include but are not necessarily
 limited exactly to:
 
 * data management consultation
 * data analysis and visualization
 * geographic information systems support
 * text mining investigations
 * referrals to other centers across campus
 
 I am expected to support the text mining investigations. I have
 traditionally used open source tools do to my work. Many of these tools
 require some sort of programming in order to exploit. To some degree I am
 expected mount text mining software on our local Windows and Macintosh
 computers here in our Center. I am familiar with the lists of tools
 available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good
 too, but a bit long in the tooth. [2]
 
 Do you know of other sets of tools to choose from? Are you familiar with
 SAS® Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5]
 
 [0] Bamboo Dirt - http://dirt.projectbamboo.org
 [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools
 [2] TAPoRware - http://taporware.ualberta.ca
 [3] Text Analytics - http://www.sas.com/text-analytics/
 [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/
 [5] RapidMiner - http://rapid-i.com/content/view/181/190/
 
 --
 Eric Lease Morgan, Digital Initiatives Librarian
 Hesburgh Libraries
 University of Notre Dame
 
 574/631-8604


Re: [CODE4LIB] text mining software

2013-08-27 Thread stuart yeates
There have been some great software recommendations in this thread, that 
I really don't want to quibble with. What I'd like to quibble with is 
the software-first approach. We've all tried the software-first 
approach, how many of us were happy with it?


There is a standard in this area and that standard appears to have at 
least two non-trivial implementations, including from one software 
distributor whose name we all recognise.


SPEC: http://docs.oasis-open.org/uima/v1.0/uima-v1.0.html
APACHE UIMA: http://uima.apache.org/
GATE: http://gate.ac.uk/

Anyone have experience using the standard or these two implementations?

cheers
stuart

--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


Re: [CODE4LIB] text mining software

2013-08-27 Thread danielle plumer
I worked a lot with GATE in a previous position (not in a library, but in a
research position at the Univ. of Texas at Austin). It's handy in that
there is both a UI version (GATE Developer) and a set of APIs (GATE
Embedded), which were the only versions I worked with. Also nice is the
fact that there is reasonably good documentation from the Univ. of
Sheffield (http://gate.ac.uk/), including some basic video tutorials and
slides from recent training courses that you can step through (
http://gate.ac.uk/wiki/TrainingCourseJune2013/).

Pretty much all the standard text-mining tools can be accessed through
GATE, by creating a pipeline that incorporates the tools you need. There
are also some default machine learning options if you don't want to roll
your own. There's even a UIMA plug-in if you'd like to use it inside a GATE
pipeline.

Danielle

-- 

Danielle Cunniff Plumer
dcplumer associates
www.dcplumer.com
dcplu...@gmail.com


On Tue, Aug 27, 2013 at 5:15 PM, stuart yeates stuart.yea...@vuw.ac.nzwrote:

 There have been some great software recommendations in this thread, that I
 really don't want to quibble with. What I'd like to quibble with is the
 software-first approach. We've all tried the software-first approach, how
 many of us were happy with it?

 There is a standard in this area and that standard appears to have at
 least two non-trivial implementations, including from one software
 distributor whose name we all recognise.

 SPEC: 
 http://docs.oasis-open.org/**uima/v1.0/uima-v1.0.htmlhttp://docs.oasis-open.org/uima/v1.0/uima-v1.0.html
 APACHE UIMA: http://uima.apache.org/
 GATE: http://gate.ac.uk/

 Anyone have experience using the standard or these two implementations?

 cheers
 stuart

 --
 Stuart Yeates
 Library Technology Services 
 http://www.victoria.ac.nz/**library/http://www.victoria.ac.nz/library/